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Title of the Invention 


NUCLEOTIDE AND DEDUCED AMINO ACID 
SEQUENCES OF THE ENVELOPE 1 AND CORE 
GENES OF ISOLATES OF HEPATITIS C VIRUS 
AND THE USE OF REAGENTS DERIVED FROM 
5 THESE SEQUENCES IN DIAGNOSTIC METHODS 

AND VACCINES 

The present application is a divisional 
application of pending U.S. Application Serial No. 
08/290,665, filed August 15, 1994, which is a continuation- 
10 in-part of U.S. Application Serial No. 08/086,428, filed on 
June 29, 1993, now U.S. Patent No. 5,514,539. 

Field Of Invention 

The present invention is in the field of 
hepatitis virology. The invention relates to the complete 
15 nucleotide and deduced amino acid sequences of the envelope 
1 (El) and core genes of hepatitis C virus (HCV) isolates 
from around the world and the grouping of these isolates 
into fourteen distinct HCV genotypes. More specifically, 
this invention relates to oligonucleotides, peptides and 
20 recombinant proteins derived from the envelope 1 and core 
gene sequences of these isolates of hepatitis C virus and 
to diagnostic methods and vaccines which employ these 
reagents . 

25 Background Of Invention 

Hepatitis C, originally called non-A, non-B 
hepatitis, was first described in 1975 as a disease 
serologically distinct from hepatitis A and hepatitis B 
(Feinstone, S.M. et al . (1975) N. Engl. J. Med. 292:767- 

30 770) . Although hepatitis C was (and is) the leading type 

of transfusion- associated hepatitis as well as an important 
part of community- acquired hepatitis, little progress was 
made in understanding the disease until the recent 
identification of hepatitis C virus (HCV) as the causative 
35 agent of hepatitis C via the cloning and sequencing of the 
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HCV genome (Choo, A.L. et al . (1989) Science 288:359-362). 

The sequence information generated by this study resulted 
in the characterization of HCV as a small, enveloped, 
positive -stranded RNA virus and led to the demonstration 
that HCV is a major cause of both acute and chronic 
hepatitis worldwide (Weiner, A.J. et al . (1990) Lancet 

335:1-3). These observations, combined with studies 
showing that over 50% of acute cases of hepatitis C 
progress to chronicity with 20% of these resulting in 
cirrhosis and an undetermined proportion progressing to 
liver cancer, have led to tremendous efforts by 
investigators within the hepatitis C field to develop 
diagnostic assays and vaccines which can detect and prevent 
hepatitis C infection. 

The cloning and sequencing of the HCV genome by 
Choo et al . (1989) has permitted the development of 

serologic tests which can detect HCV or antibody to HCV 
(Kuo, G. et al. (1989) Science 244:362-364). In addition, 
the work of Choo et al . has also allowed the development of 
methods for detecting HCV infection via amplification of 
HCV RNA sequences by reverse transcription and cDNA 
polymerase chain reaction (RT-PCR) using primers derived 
from the HCV genomic sequence (Weiner, A.J. et al . ) . 
However, although the development of these diagnostic 
methods has resulted in improved diagnosis of HCV 
infection, only approximately 60% of cases of hepatitis C 
are associated with a factor identified as contributing to 
transmission of HCV (Alter, M.J. et al . (1989) JAMA 

262:1201-1205). This observation suggests that effective 
control of hepatitis C transmission is likely to occur only 
via universal pediatric vaccination as has been initiated 
recently for hepatitis B virus. Unfortunately, attempts to 
date to protect chimpanzees from hepatitis C infection via 
administration of recombinant vaccines have had only 
limited success. Moreover, the apparent genetic 
heterogeneity of HCV, as indicated by the recent assignment 
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of all available HCV isolates to one of four genotypes, I- 
IV (Okamoto, H. et al . (1992) J. Gen. Virol; 73:673-679), 

presents additional hurdles which must be overcome in order 
to develop accurate and effective diagnostic assays and 
vaccines . 

For example, one possible obstacle to the 
development of effective hepatitis C vaccines would arise 
if the observed genetic heterogeneity of HCV reflects 
serologic heterogeneity. In such a case, the most 
genetically diverse strains of HCV may then represent 
different serotypes of HCV with the result being that 
infection with one strain may not protect against infection 
with another. Indeed, the inability of one strain to 
protect against infection with another strain was recently 
noted by both Farci et al . (Farci, P. et al . (1992) Science 

258:135-140) and Prince et al . (Prince, A.M. et al . (1992) 

J. Infect. Dis. 165:438-443), each of whom presented 
evidence that while infection with one strain of HCV does 
modify the degree of the hepatitis C associated with the 
reinfection, it does not protect against reinfection with a 
closely related strain. The genetic heterogeneity among 
different HCV strains also increases the difficulty 
encountered in developing RT-PCR assays to detect HCV 
infection since such heterogeneity often results in false- 
negative results because of primer and template mismatch. 

In addition, currently used serologic tests for detection 
of HCV or for detection of antibody to HCV are not 
sufficiently well developed to detect all of the HCV 
genotypes which might exist in a given blood sample. 
Finally, in terms of choosing the proper treatment modality 
to combat hepatitis infection, the inability of presently 
available serologic assays to distinguish among the various 
genotypes of HCV represents a significant shortcoming in 
that recent reports suggest that an HCV- infected patient's 
response to therapy might be related to the genotype of the 
infectious virus (Yoshioka, K. et al . (1992) Hepatology 


372577J 



4 


5 


10 


15 


20 


25 


30 


35 


16:293-299; Kanai, K. et al . (1992) Lancet 339:1543; Lan, 

J.Y.N. et al . (1992) Hepatology 16:209A). Indeed, the data 

presented in the above studies suggest that the closely 
related genotypes I and II are less responsive to 
interferon therapy than are the closely related genotypes 
III and IV. Moreover, preliminary data by Pozzato et al . 
(Pozzato, G. et al . (1991) Lancet 338:509) suggests that 

different genotypes may be associated with different types 
or degrees of clinical disease. Taken together, these 
studies suggest that before effective vaccines against HCV 
infection can be developed, and indeed, before more 
accurate and effective methods for diagnosis and treatment 
of HCV infection can be produced, one must obtain a greater 
knowledge about the genetic and serologic diversity of HCV 
isolates . 

In a recent attempt to gain an understanding of 
the extent of genetic heterogeneity among HCV strains, Bukh 
et al . carried out a detailed analysis of HCV isolates via 
the use of PCR technology to amplify different regions of 
the HCV genome (Bukh, J. et al . (1992a) Proc . Natl. Acad. 

Sci . 89:187-191). Following PCR amplification, the 5'- 
noncoding (5' NC) portion of the genomes of various HCV 
isolates were sequenced and it was found that primer pairs 
designed from conserved regions of the 5' NC region of the 
HCV genome were more sensitive for detecting the presence 
of HCV than were primer pairs representing other portions 
of the genome (Bukh, J. et al . (1992b) Proc. Natl. Acad. 

Sci. U.S.A. 89:4942-4946) . In addition, the authors noted 
that although many of the HCV isolates examined could be 
classified into the four genotypes described by Okamoto et 
al . (1992), other previously undescribed genotypes emerged 

based on genetic heterogeneity observed in the 5' NC region 
of the various isolates. One of the most prominent of 
these newly noted genotypes comprised a group of related 
viruses that contained the most genetically divergent 5' NC 
regions of those studied. This group of viruses, 
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tentatively classified as a fifth genotype, are very 
similar to strains recently described by others (Cha, T.-A 
et al. (1992) Proc . Natl. Acad. Sci. U.S.A. 89:7144-7148; 
Chan, S-W. et al . (1992) J. Gen. Virol., 73:1131-1141 and 

Lee, C-H et al . (1992) J. Clin. Microbio. 30:1602-1604). 

In addition, at least four more putative genotypes were 
identified thereby providing evidence that the genetic 
heterogeneity of HCV was more extensive than previously 
appreciated . 

However, while the studies of Bukh et al . (1992a 

and b) provided new and useful information on the genetic 
heterogeneity of HCV, it is widely appreciated by those 
skilled in the art that the three structural genes of HCV, 
core (C) , envelope (El) and envelope 2/nonstructural 1 
(E2/NS1) are the most important for the development of 
serologic diagnostics and vaccines since it is the product 
of these genes that constitutes the hepatitis C virion. 
Thus, a determination of the nucleotide sequence of one or 
all of the structural genes of a variety of HCV isolates 
would be useful in designing reagents for use in diagnostic 
assays and vaccines since a demonstration of genetic 
heterogeneity in a structural gene(s) of HCV isolates might 
suggest that some of the HCV genotypes represent distinct 
serotypes of HCV based upon the previously observed 
relationship between genetic heterogeneity and serologic 
heterogeneity among another group of single -stranded, 
positive-sense RNA viruses, the picornaviruses (Ruechert, 
R.R. "Picornaviridae and their replication", in Fields, 

B.N. et al . , eds . Virology, New York: Raven Press, Ltd. 

(1990) 507-548) . 


Summary of Invention 

The present invention relates to cDNAs encoding 
the complete nucleotide sequence of either the envelope 1 
(El) gene or the core (C) gene of an isolate of human 
hepatitis C virus (HCV) . 
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The present invention also relates to the nucleic 
acid and deduced amino acid sequences of these El and core 
cDNAs . 

It is an object of this invention to provide 
synthetic nucleic acid sequences capable of directing 
production of recombinant El and core proteins, as well as 
equivalent natural nucleic acid sequences. Such natural 
nucleic acid sequences may be isolated from a cDNA or 
genomic library from which the gene capable of directing 
synthesis of the El or core proteins may be identified and 
isolated. For purposes of this application, nucleic acid 
sequence refers to RNA, DNA, cDNA or any synthetic variant 
thereof which encodes for peptides. 

The invention also relates to the method of 
preparing recombinant El and core proteins derived from El 
and core cDNA sequences respectively by cloning the nucleic 
acid encoding either the recombinant El or core protein and 
inserting the cDNA into an expression vector and expressing 
the recombinant protein in a host cell. 

The invention also relates to isolated and 
substantially purified recombinant El and core proteins and 
analogs thereof encoded by El and core cDNAs respectively. 

The invention further relates to the use of 
recombinant El and core proteins, either alone, or in 
combination with each other, as diagnostic agents and as 
vaccines . 

The present invention also relates to the 
recombinant production of the core protein of the present 
invention to contain a second protein on its surface and 
therefore serve as a carrier in a multivalent vaccine 
preparation. Further, the present invention relates to the 
use of the self aggregating core or envelope proteins as a 
drug delivery system for anti-virals. 

The invention also relates to the use of single- 
stranded antisense poly- or oligonucleotides derived from 
El or core cDNAs, or from both El and core cDNAs, to 
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inhibit expression of hepatitis C El and/or core genes. 

The invention further relates to multiple 
computer-generated alignments of the nucleotide and deduced 
amino acid sequences of the El and core cDNAs . These 
multiple sequence alignments produce consensus sequences 
which serve to highlight regions of homology and non- 
homology between sequences found within the same genotype 
or in different genotypes and hence, these alignments can 
be used by one skilled in the art to design peptides and 
oligonucleotides useful as reagents in diagnostic assays 
and vaccines. 

The invention therefore also relates to purified 
and isolated peptides and analogs thereof derived from El 
and core cDNA sequences . 

The invention further relates to the use of these 
peptides as diagnostic agents and vaccines. 

The present invention also encompasses methods of 
detecting antibodies specific for hepatitis C virus in 
biological samples. The methods of detecting HCV or 
antibodies to HCV disclosed in the present invention are 
useful for diagnosis of infection and disease caused by HCV 
and for monitoring the progression of such disease. Such 
methods are also useful for monitoring the efficacy of 
therapeutic agents during the course of treatment of HCV 
infection and disease in a mammal. 

The invention also provides a kit for the 
detection of antibodies specific for HCV in a biological 
sample where said kit contains at least one purified and 
isolated peptide derived from the El or core cDNA 
sequences. In addition, the invention provides for a kit 
containing at least one purified and isolated peptide 
derived from the El cDNA sequences and at least one 
purified and isolated peptide derived from the core cDNA 
sequences . 

The invention further provides isolated and 
purified genotype-specific oligonucleotides and analogs 
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thereof derived from El and core cDNA sequences . 

The invention also relates to methods for 
detecting the presence of hepatitis C virus in a mammal, 
said methods comprising analyzing the RNA of a mammal for 
the presence of hepatitis C virus. The invention further 
relates to methods for determining the genotype of 
hepatitis C virus present in a mammal. This method is 
useful in determining the proper course of treatment for an 
HCV- inf ected patient. 

The invention also provides a diagnostic kit for 
the detection of hepatitis C virus in a biological sample. 
The kit comprises purified and isolated nucleic acid 
sequences useful as primers for reverse-transcription 
polymerase chain reaction (RT-PCR) analysis of RNA for the 
presence of hepatitis C virus genomic RNA. 

The invention further provides a diagnostic kit 
for the determination of the genotype of a hepatitis C 
virus present in a mammal. The kit comprises purified and 
isolated nucleic acid sequences useful as primers for RT- 
PCR analysis of RNA for the presence of HCV in a biological 
sample and purified and isolated nucleic acid sequences 
useful as hybridization probes in determining the genotype 
of the HCV isolate detected in PCR analysis. 

This invention also relates to pharmaceutical 
compositions useful in prevention or treatment of hepatitis 
C in a mammal . 


Description of Figures 

Figures 1 A-H show computer generated sequence 
alignments of the nucleotide sequences of 51 HCV El cDNAs . 
The single letter abbreviations used for the nucleotides 
shown in Figures 1A-H are those standardly used in the art . 
Figure 1A shows the alignment of SEQ ID N0s:l-8 to produce 
a consensus sequence for genotype i/la. Figure IB shows 
the alignment of SEQ ID NOs:9-25 to produce a consensus 
sequence for genotype Il/lb. Figure 1C shows the alignment 
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of SEQ ID NOs : 26-29 to produce a consensus sequence for 
genotype IIl/2a. Figure ID shows the alignment of SEQ ID 
NOs: 30-33 to produce a consensus sequence for genotype 
IV/2b. Figure IE shows the alignment of SEQ ID NOs: 35-3 9 
to produce a consensus sequence for genotype V/3a. Figure 
IF shows the computer alignment of SEQ ID NOs: 42 -4 3 to 
produce a "consensus" sequence for genotype 4C where the 
"consensus" sequence given is that of SEQ ID NO:42. Figure 
1G shows the alignment of SEQ ID NOs: 45-50 to produce a 
consensus sequence for genotype 5a. The nucleotides shown 
in capital letters in the consensus sequences of Figures 
1A-G are those conserved within a genotype while 
nucleotides shown in lower case letters in the consensus 
sequences are those variable within a genotype. In 
addition, in Figures 1A-E and 1G, when the lower case 
letter is shown in a consensus sequence, the lower case 
letter represents the nucleotide found most frequently in 
the sequences aligned to produce the consensus sequence. 

In Figure IF, the lower case letters shown in the consensus 
sequence are nucleotides in SEQ ID NO: 42 which differ from 
nucleotides found in the same positions in SEQ ID NO: 43. 
Finally, a hyphen at a nucleotide position in the consensus 
sequences in Figures 1A-G indicates that two nucleotides 
were found in equal numbers at that position in the aligned 
sequences. In the aligned sequences, nucleotides are shown 
in lower case letters if they differed from the nucleotides 
of both adjacent isolates. Figure 1H shows the alignment 
of the consensus sequences of Figures 1A-G with SEQ ID 
NO: 34 (genotype 2c), SEQ ID NO: 40 (genotype 4a), SEQ ID 
NO: 41 (genotype 4b) , SEQ ID NO: 44 (genotype 4d) and SEQ ID 
NO: 51 (genotype 6a) to produce a consensus sequence for all 
twelve genotypes. This consensus sequence is shown as the 
bottom line of Figure 1H where the nucleotides shown in 
capital letters are conserved among all genotypes and a 
blank space indicates that the nucleotide at that position 
is not conserved among all genotypes. 
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Figures 2A-H show computer alignments of the 
deduced amino acid sequences of 51 HCV El cDNAs . The 
single letter abbreviations used for the amino acids shown 
in Figures 2A-H follow the conventional amino acid 
shorthand for the twenty naturally occurring amino acids. 
Figure 2A shows the alignment of SEQ ID NOs : 52-59 to 
produce a consensus sequence for genotype I/la. Figure 2B 
shows the alignment of SEQ ID NOs: 60-76 to produce a 
consensus sequence for genotype Il/lb. Figure 2C shows the 
alignment of SEQ ID NOs: 77-80 to produce a consensus 
sequence for genotype III/2a. Figure 2D shows the 
alignment of SEQ ID NOs: 81-84 to produce a consensus 
sequence for genotype IV/2b. Figure 2E shows the alignment 
of SEQ ID NOs: 86-90 to produce a consensus sequence for 
genotype V/3a. Figure 2F shows the computer alignment of 
SEQ ID NOs: 93 -94 to produce a consensus sequence for 
genotype 4c. Figure 2G shows the alignment of SEQ ID 
NOs: 96-101 to produce a consensus sequence for genotype 5a. 
The amino acids shown in capital letters in the consensus 
sequences of Figures 2A-G are those conserved within a 
genotype while amino acids shown in lower case letters in 
the consensus sequences are those variable within a 
genotype. In addition, in Figures 2A-E and 2G when the 
lower case letter is shown in a consensus sequence, the 
letter represents the amino acid found most frequently in 
the sequences aligned to produce the consensus sequence. 

In Figure 2F, the lower case letters shown in the consensus 
sequence are amino acids in SEQ ID NO: 93 which differ from 
amino acids found in the same positions in SEQ ID NO: 94. 
Finally, a hyphen at an amino acid position in the 
consensus sequences of Figures 2A-G indicates that two 
amino acids were found in equal numbers at that position in 
the aligned sequences. In the aligned sequences, amino 
acids are shown in lower case letters if they differed from 
the amino acids of both adjacent isolates. Figure 2H shows 
the alignment of the consensus sequences of Figures 2A-G 
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with SEQ ID NO: 85 (genotype 2c), SEQ ID NO: 91 (genotype 
4a), SEQ ID NO:92 (genotype 4b), SEQ ID NO:95 (genotype 4d) 
and SEQ ID NO: 102 (genotype 6a) to produce a consensus 
sequence for all twelve genotypes. This consensus sequence 
is shown as the bottom line of Figure 2H where the amino 
acids shown in capital letters are conserved among all 
genotypes and a blank space indicates that the amino acid 
at that position is not conserved among all genotypes. 

Figure 3 shows multiple sequence alignment of the 
deduced amino acid sequence of the El gene of 51 HCV 
isolates collected worldwide. The consensus sequence of 
the El protein is shown in boldface (top) . In the 
consensus sequence cysteine residues are highlighted with 
stars, potential N-linked glycosylation sites are 
underlined, and invariant amino acids are capitalized, 
whereas variable amino acids are shown in lower case 
letters. In the alignment, amino acids are shown in lower 
case letters if they differed from the amino acid of both 
adjacent isolates. Amino acid residues shown in bold print 
in the alignment represent residues which at that position 
in the amino acid sequence are genotype-specific. Amino 
acids that were invariant among all HCV isolates are shown 
as hyphens (-) in the alignment. Amino acid positions 
correspond to those of the HCV prototype sequence (HCV-1, 
Choo, L. et al . (1991) Proc . Natl. Acad. Sci. USA 88:2451- 

2455) with the first amino acid of the El protein at 
position 192 . The grouping of isolates into 12 genotypes 
(1/la, Il/lb, III/2a, IV/2b, V/3a, 2c, 4a, 4b, 4c, 4d, 5a 
and 6a) is indicated. 

Figure 4 shows a dendrogram of the genetic 
relatedness of the twelve genotypes of HCV based on the 
percent amino acid identity of the El gene of the HCV 
genome. The twelve genotypes shown are designated as I /la, 
Il/lb, III/2a, IV/2b, V/3a, 2c, 4a, 4b, 4c, 4d, 5a and 6a. 
The shaded bars represent a range showing the maximum and 
minimum homology between the amino acid sequence of any one 
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isolate of the genotype indicated and the amino acid 
sequence of any other isolate. 

Figure 5 shows the distribution of the complete 
El gene sequence of 74 HCV isolates into the twelve HCV 
genotypes in the 12 countries studied. For 51 of these HCV 
isolates, including 8 isolates of genotype I/la, 17 
isolates of genotype Il/lb and 26 isolates comprising the 
additional 10 genotypes, the complete El gene sequence was 
determined. In the remaining 23 isolates, all of genotypes 
I/la and Il/lb, the genotype assignment was based on only a 
partial El gene sequence. The partially sequenced isolates 
did not represent additional genotypes in any of the 12 
countries. The number of isolates of a particular genotype 
is given in each of the 12 countries studied. For ease of 
viewing, those genotypes designated by two terms (e.g., 

I/la) are indicated by the latter term (e.g. la) . The 
designations used for each country are: Denmark (DK) ; 

Dominican Republic (DR) ; Germany (D) ; Hong Kong (HK) ; India 
(IND) ; Sardinia, Italy (S) ; Peru (P) ; South Africa (SA) ; 
Sweden (SW) ; Taiwan (T) ; United States (US) ; and Zaire (Z) . 
National borders depicted in this figure represent those 
existing at the time of sampling. 

Figures 6A-K show computer generated sequence 
alignments of the nucleotide sequences of 52 HCV core 
cDNAs . Single letter abbreviations used for the 
nucleotides shown in Figures 6A-J are those standardly used 
in the art. Figure 6A shows the alignment of SEQ ID NOs : 
103-108 to produce a consensus sequence for genotype I/la. 
Figure 6B shows the alignment of SEQ ID NOs : 109-124 to 

produce a consensus sequence for genotype Il/lb. Figure 6C 
shows the alignments of the sequences comprising minor 
genotypes I/la (SEQ ID NOS: 103-108) and Il/lb (SEQ ID NOs : 
109-124) to produce a consensus sequence for the major 
genotype, genotype 1. Figure 6D shows the alignment of SEQ 
ID NOs: 125-128 to produce a consensus sequence for 
genotype IIl/2a. Figure 6E shows the alignment of SEQ ID 
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NOs : 12 9-133 to produce a consensus sequence for genotype 
IV/2b. Figure 6F shows the alignment of the sequences of 
minor genotypes III/2a (SEQ ID NOs: 125-128), IV/2b (SEQ ID 
NOs: 129-133) and 2c (SEQ ID NO: 134) to produce a 
consensus sequence for the major genotype, genotype 2. 
Figure 6G shows the alignment of SEQ ID NOs: 135-138 to 
produce a consensus sequence for genotype V/3a. Figure 6H 
shows the computer alignment of the sequences of minor 
genotypes 4a-4f (SEQ ID NOs : 139-145) to produce a 
consensus sequence for the major genotype, genotype 4. 
Figure 61 shows the alignment of SEQ ID NOs: 146-153 to 
produce a consensus sequence for genotype 5a. The 
nucleotides shown in capital letters in the consensus 
sequences in Figure 6A-I are those conserved within the 
genotype while nucleotides shown in lower case letters in 
the consensus sequences are those variable within a 
genotype. In addition, when the lower case letter is shown 
in the consensus sequence, the lower case letter represents 
the nucleotide found most frequently in the sequences 
aligned to produce that consensus sequence. Moreover, a 
hyphen at a nucleotide position in the consensus sequences 
in Figures 6A-6I indicates that two nucleotides were found 
in equal numbers at that position in the sequences aligned 
to produce the consensus sequence. Finally, nucleotides 
are shown in lower case letters in the sequences aligned to 
produce each consensus sequence shown in Figures 6A-6I, if 
they differed from the nucleotides of both adjacent 
isolates. Figure 6J shows the alignment of the consensus 
sequences of major genotypes 1 (Figure 6C) , 2 (Figure 6F) , 

3 (Figure 6G) , 4 (Figure 6H) , 5 (Figure 61) and 6 (SEQ ID 
NO: 154) to produce a consensus sequence for all genotypes 
and Figure 6K shows the alignment of consensus sequences of 
Figures 6A, 6B, 6D, 6E, 6G and 61 with SEQ ID NO: 134 

(genotype 2c), SEQ ID NO:139 (genotype 4a), SEQ ID N0:141 

(genotype 4b) , SEQ ID NO: 143 (genotype 4c) , SEQ ID NO: 145 

(genotype 4d) , SEQ ID NO: 142 (genotype 4e) , SEQ ID NO: 140 
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° (genotype 4f) and SEQ ID NO: 154 (genotype 6a) to produce a 
consensus sequence for all fourteen genotypes. The 
nucleotides shown in capital letters in the consensus 
sequences of Figures 6J and 6K are conserved among all 
genotypes and the nucleotide shown in lower case letter 
5 represent the nucleotides found most frequently in the 

sequences aligned to produce this consensus sequence. In 
addition, the presence of a hyphen at a nucleotide position 
in all fourteen sequences aligned in Figure 6K indicates 
that the nucleotide found at that position in the aligned 
10 sequences is the same as nucleotide shown at the 

corresponding position in the consensus sequences of Figure 
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Figures 7A-7J show computer alignments of the 
deduced amino acid sequences of the 52 HCV core cDNAs . The 
single letter abbreviations used for the amino acids shown 
in Figures 7A-7J follow the conventional amino acid short 
hand for the twenty natural occurring amino acids. Figure 
7A shows the alignment of SEQ ID NOs : 155-160 to produce a 
consensus sequence for genotype I/la. Figure 7B shows the 
alignment of SEQ ID NOs: 161-176 to produce a consensus 

sequence for genotype I I /lb. Figure 7C shows the alignment 
of the sequences comprising minor genotypes I /a (SEQ ID 
NOS: 155-160) and Il/lb (SEQ ID NOS: 161-176) to produce a 

consensus sequence for the major genotype, genotype 1. 
Figure 7D shows the alignment of SEQ ID NOs : 177-180 to 
produce a consensus sequence for genotype IIl/2a. Figure 
7E shows the alignment of SEQ ID NOs : 181-185 to produce a 
consensus sequence for genotype IV/2b. Figure 7F shows the 
alignment of the sequences of minor genotypes III/2a (SEQ 
ID NOS: 177-180), IV/2b (SEQ ID NOS: 181-185) and 2c (SEQ 
ID NO: 186) to produce a consensus sequence for the major 

genotype, genotype 2. Figure 7G shows the alignment of SEQ 
ID NOs: 187-190 to produce a consensus sequence for 
genotype V/3a. Figure 7H shows the computer alignment of 
the sequences of minor genotypes 4a-4f (SEQ ID NOs: 191- 
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197) to produce a consensus sequence for the major 
genotype, genotype 4. Figure 71 shows the alignment of SEQ 
ID NOs : 198-205 to produce a consensus sequence for 
genotype 5a. The amino acids shown in capital letters in 
the consensus sequences of Figures 7A-7I are those 
conserved within the genotype while amino acids shown in 
lower case letters in the consensus sequences are those 
variable within the genotype. In addition, when a lower 
case letter is found in the consensus sequences shown in 
Figures 7A-7I, the letter represents the amino acid found 
most frequently in the sequences aligned to produce that 
consensus sequence. Moreover, a hyphen in an amino acid 
position in the consensus sequences of Figures 7A-7I 
indicates that two amino acids were found in equal numbers 
at that position in the sequences aligned to produce that 
consensus sequence. Finally, amino acids are shown in 
lower case letters in the sequences aligned to produce the 
consensus sequences shown in Figures 7A-7I if these amino 
acids differed from the amino acids of both adjacent 
isolates. Figure 7J shows the alignment of the consensus 
sequences of major genotypes 1 (Figure 7C) , 2 (Figure 7F) , 

3 (Figure 7G) , 4 (Figure 7H) , 5 (Figure 71) and 6 (SEQ ID 
NO: 154) to produce a consensus sequence for all genotypes 
and Figure 7K shows the alignment of the consensus 
sequences of Figures 7A, 7B, 7D, 7E, 7G and 71 with SEQ ID 

NO: 186 (genotype 2c), SEQ ID NO: 191 (genotype 4a), SEQ ID 

NO : 193 (genotype 4b), SEQ ID NO:195 (genotype 4c), SEQ ID 

NO : 1 9 7 (genotype 4d) , SEQ ID NO:194 (genotype 4e) , SEQ ID 

NO: 192 (genotype 4f) and SEQ ID NO: 206 (genotype 6a) to 

produce a consensus sequence for all fourteen genotypes. 
The amino acids shown in capital letters in the consensus 
sequences shown in Figures 7J and 7K are conserved among 
all genotypes while the amino acids shown in lower case 
letters represent amino acids found most frequently in the 
sequences aligned to produce this consensus sequence. In 
addition, the presence of a hyphen at an amino acid 
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position in all fourteen sequences aligned in Figure 7K 
indicates that the amino acid found at that position in the 
aligned sequences is the same as the amino acid shown at 
the corresponding position in the consensus sequence of 
Figure 7K. 

Figure 8 shows phylogenetic trees illustrating 
the calculated evolutionary relationships of the different 
HCV isolates based upon the C gene sequence of 52 HCV 
isolates and the El gene sequence of 51 HCV isolates, 
respectively. The phylogenetic trees were constructed by 
the unweighted pair-group method with arithmetic mean (Nei, 
M. (1987) Molecular Evolutionary Genetics (Columbia 
University Press, New York, N.Y.), pp 287-326) using the 
computer software package "Gene Works" from 
IntelliGenet ics . The lengths of the horizontal lines 
connecting the sequences, given in absolute values from 0 
to 1, are proportional to the estimated genetic distances 
between the sequences. Genotype designations of HCV 
isolates are indicated. In 45 HCV isolates, both the C and 
the El gene sequences were determined. 

Detailed Description Of Invention 

The present invention relates to cDNAs encoding 
the complete nucleotide sequence of the envelope 1 (El) and 
core genes of isolates of human hepatitis C virus (HCV) . 

The El cDNAs of the present invention were obtained as 
follows. Viral RNA was extracted from serum collected from 
humans infected with hepatitis C virus and the viral RNA 
was then reverse transcribed and amplified by polymerase 
chain reaction using primers deduced from the sequence of 
the HCV strain H-77 (Ogata, N. et al . (1991) Proc . Natl. 

Acad. Sci. U.S.A. 88:3392-3396). The amplified cDNA was 
then isolated by gel electrophoresis and sequenced. 

The present invention further relates to the 
nucleotide sequences of the cDNAs encoding the El gene of 
51 HCV isolates. These nucleotide sequences are shown in 
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the sequence listing as SEQ ID NO : 1 through SEQ ID NO: 51. 

The abbreviations used for the nucleotides are 
those standardly used in the art . 

The deduced amino acid sequence of each of SEQ ID 
NO : 1 through SEQ ID NO: 51 are presented in the sequence 
listing as SEQ ID NO: 52 through SEQ ID NO: 102 where the 
amino acid sequence in SEQ ID NO: 52 is deduced from the 
nucleotide sequence shown in SEQ ID NO : 1 , the amino acid 
sequence shown in SEQ ID NO: 53 is deduced from the 
nucleotide sequence shown in SEQ ID NO : 2 and so on. The 
deduced amino acid sequence of each of SEQ ID Nos: 52-102 
starts at nucleotide 1 of the corresponding nucleic acid 
sequence shown in SEQ ID NOs:l-51 and extends 575 
nucleotides to a total length of 576 nucleotides. 

The three letter abbreviations used in SEQ ID 
Nos: 52-102 follow the conventional amino acid shorthand for 
the twenty naturally occurring amino acids. 

The present invention also relates to the 
nucleotide sequences of the cDNAs encoding the core gene of 
52 HCV isolates. These nucleotide sequences are shown in 
the sequence listing as SEQ ID NO: 103 through SEQ ID 
NO: 154. 

The core cDNAs of the present invention were 
obtained as follows. Viral RNA was extracted from serum 
and reversed transcribed as described above for cloning of 
the El cDNAs. The core cDNAs of the present invention were 
then amplified by polymerase chain reaction using primers 
deduced from previously determined sequences that flank the 
core gene (Bukh et al . (1992)) Proc . Natl. Acad. Sci . 

u. s .A. . 89: 4942-4946; Bukh et al . (1993) Proc. Natl. Acad. 

Sci. U.S.A. . 90: 8234-8238). 

The deduced amino acid sequence of each of SEQ ID 
NO: 103 through SEQ ID NO: 154 are presented in the sequence 
listing as SEQ ID NO: 155 through SEQ ID NO: 206 where the 
amino acid sequence in SEQ ID NO: 155 is deduced from the 
nucleotide sequence shown in SEQ ID NO: 103, the amino acid 
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sequence shown in SEQ ID NO: 156 is deduced from the 
nucleotide sequence shown in SEQ ID NO: 104 and so on. The 
deduced amino acid sequence of each of SEQ ID NOs : 155-206 

starts at nucleotide 1 of the corresponding nucleotide 
sequence shown in SEQ ID NOs: 103-154 and extends 572 
nucleotides to a total length of 573 nucleotides. 

Preferably, the El and core proteins and peptides 
of the present invention are substantially homologous to, 
and most preferably biologically equivalent to, native HCV 
El and core proteins and peptides. By "biologically 
equivalent" as used throughout the specification and 
claims, it is meant that the compositions are 
immunogenically equivalent to the native El and core 
proteins and peptides. The El and core proteins and 
peptides of the present invention may also stimulate the 
production of protective antibodies upon injection into a 
mammal that would serve to protect the mammal upon 
challenge with HCV. By "substantially homologous" as used 
throughout the ensuing specification and claims to describe 
El and core proteins and peptides, it is meant a degree of 
homology in the amino acid sequence of the El and core 
proteins and peptides to the native El and core proteins 
and peptides respectively. Preferably the degree of 
homology is in excess of 90, preferably in excess of 95, 
with a particularly preferred group of proteins being in 
excess of 99 homologous with the native El or core proteins 
and peptides. 

Variations are contemplated in the cDNA sequences 
shown in SEQ ID NO : 1 through SEQ ID NO: 51 and in SEQ ID 
NO: 103 through SEQ ID NO: 154 which will result in a nucleic 
acid sequence that is capable of directing production of 
analogs of the corresponding protein shown in SEQ ID NO: 52 
through SEQ ID NO: 102 and in SEQ ID NO: 155 through SEQ ID 
NO: 206. It should be noted that the cDNA sequences set 
forth above represent a preferred embodiment of the present 
invention. Due to the degeneracy of the genetic code, it 
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is to be understood that numerous choices of nucleotides 
may be made that will lead to a DNA sequence capable of 
directing production of the instant protein or its analogs. 
As such, DNA sequences which are functionally equivalent to 
the sequence set forth above or which are functionally 
equivalent to sequences that would direct production of 
analogs of the El and core proteins produced pursuant to 
the amino acid sequences set forth above, are intended to 
be encompassed within the present invention. 

The term analog as used throughout the 
specification or claims to describe the El and core 
proteins and peptides of the present invention, includes 
any protein or peptide having an amino acid residue 
sequence substantially identical to a sequence specifically 
shown herein in which one or more residues have been 
conservatively substituted with a biologically equivalent 
residue. Examples of conservative substitutions include 
the substitution of one polar (hydrophobic) residue such as 
isoleucine, valine, leucine or methionine for another, the 
substitution of one polar (hydrophilic) residue for another 
such as between arginine and lysine, between glutamine and 
asparagine, between glycine and serine, the substitution of 
one basic residue such as lysine, arginine or histidine for 
another, or the substitution of one acidic residue, such as 
aspartic acid or glutamic acid for another. 

The phrase "conservative substitution" also 
includes the use of a chemically derivatized residue in 
place of a non-derivatized residue provided that the 
resulting protein or peptide is biologically equivalent to 
the native El or core protein or peptide. 

"Chemical derivative" refers to an El or core 
protein or peptide having one or more residues chemically 
derivatized by reaction of a functional side group. 

Examples of such derivatized molecules, include but are not 
limited to, those molecules in which free amino groups have 
been derivatized to form amine hydrochlorides, p- toluene 
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sulfonyl groups, carbobenzoxy groups, t-butyloxycarbonyl 
groups, chloracetyl groups or formyl groups. Free carboxyl 
groups may be derivatized to form salts, methyl and ethyl 
esters or other types of esters or hydrazides. Free 
hydroxyl groups may be derivatized to form O-acyl or 0- 
alkyl derivatives. The imidazole nitrogen of histidine may 
be derivatized to form N-imbenzylhistidine . Also included 
as chemical derivatives are those proteins or peptides 
which contain one or more naturally-occurring amino acid 
derivatives of the twenty standard amino acids. For 
examples: 4 -hydroxyproline may be substituted for proline; 

5 -hydroxylysine may be substituted for lysine; 3- 
methyl histidine may be substituted for histidine; 
homoserine may be substituted for serine; and ornithine may 
be substituted for lysine. The El and core proteins and 
peptide of the present invention also includes any protein 
or peptide having one or more additions and/or deletions of 
residues relative to the sequence of a peptide whose 
sequence is shown herein, so long as the peptide is 
biologically equivalent to the native El or core protein or 
peptide . 

The present invention also includes a recombinant 
DNA method for the manufacture of HCV El and core proteins. 
In this method, natural or synthetic nucleic acid sequences 
may be used to direct the production of El and core 
proteins . 

In one embodiment of the invention, the method 

comprises : 

(a) preparation of a nucleic acid sequence 
capable of directing a host organism to produce HCV El or 
core protein; 

(b) cloning the nucleic acid sequence into a 
vector capable of being transferred into and replicated in 
a host organism, such vector containing operational 
elements for the nucleic acid sequence; 

(c) transferring the vector containing the 
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nucleic acid and operational elements into a host organism 
capable of expressing the protein; 

(d) culturing the host organism under conditions 
appropriate for amplification of the vector and expression 
of the protein; and 

(e) harvesting the protein. 

In another embodiment of the invention, the 
method for the recombinant DNA synthesis of an HCV El 
protein encoded by any one of the nucleic acid sequences 
shown in SEQ ID NOs:l-51 comprises: 

(a) culturing a transformed or transfected host 
organism containing a nucleic acid sequence capable of 
directing the host organism to produce a protein, under 
conditions such that the protein is produced, said protein 
exhibiting substantial homology to a native El protein 
isolated from HCV having the amino acid sequence according 
to any one of the amino acid sequences shown in SEQ ID 
NOs : 52-102 or combinations thereof. 

In one embodiment, the RNA sequence of an HCV 
isolate was isolated and converted to cDNA as follows. 

Viral RNA is extracted from a biological sample collected 
from human subjects infected with hepatitis C and the viral 
RNA is then reverse transcribed and amplified by polymerase 
chain reaction using primers deduced from the sequence of 
HCV strain H-77 (Ogata et al . (1991)). Preferred primer 

sequences are shown as SEQ ID NOs: 207-212 in the sequence 
listing. Once amplified, the PCR fragments are isolated by 
gel electrophoresis and sequenced. 

In an alternative embodiment, the above method 
may be utilized for the recombinant DNA synthesis of an HCV 
core protein encoded by any one of the nucleic acid 
sequences shown in SEQ ID NOS: 103-154, where the protein 
produced by this method exhibits substantial homology to a 
native core protein isolated from HCV having amino acid 
sequence according to any one of the amino acid sequences 
shown in SEQ ID NOS: 155-206 or combinations thereof. 
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The vectors contemplated for use in the present 
invention include any vectors into which a nucleic acid 
sequence as described above can be inserted, along with any 
preferred or required operational elements, and which 
vector can then be subsequently transferred into a host 
organism and replicated in such organisms. Preferred 
vectors are those whose restriction sites have been well 
documented and which contain the operational elements 
preferred or required for transcription of the nucleic acid 
sequence . 

The "operational elements" as discussed herein 
include at least one promoter, at least one operator, at 
least one leader sequence, at least one terminator codon, 
and any other DNA sequences necessary or preferred for 
appropriate transcription and subsequent translation of the 
vector nucleic acid. In particular, it is contemplated 
that such vectors will contain at least one origin of 
replication recognized by the host organism along with at 
least one selectable marker and at least one promoter 
sequence capable of initiating transcription of the nucleic 
acid sequence . 

In construction of the recombinant expression 
vectors of the present invention, it should additionally be 
noted that multiple copies of the nucleic acid sequence of 
interest (either El or core) and its attendant operational 
elements may be inserted into each vector. In such an 
embodiment, the host organism would produce greater amounts 
per vector of the desired El or core protein. The number 
of multiple copies of the nucleic acid sequence which may 
be inserted into the vector is limited only by the ability 
of the resultant vector due to its size, to be .transferred 
into and replicated and transcribed in an appropriate host 
microorganism . 

Of course, those skilled in the art would readily 
understand that copies of both core and El nucleic acid 
sequence may be inserted into single vector such that a 
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host organism transformed or transfected with said vector 
would produce both the desired El and core proteins. For 
example, a polysistronic vector in which multiple different 
El and/or core proteins may be expressed from a single 
vector is created by placing expression of each protein 
under control of an internal ribosomal entry site 
(IRES) (Molla, A. et al . Nature , 356:255-257 (1992); Gong, 

S.K. et al. J. of Virol . . 263:1651-1660 (1989)). 

In another embodiment, restriction digest 
fragments containing a coding sequence for El or core 
proteins can be inserted into a suitable expression vector 
that functions in prokaryotic or eukaryotic cells. By 
suitable is meant that the vector is capable of carrying 
and expressing a complete nucleic acid sequence coding for 
an El or core protein. Preferred expression vectors are 
those that function in a eukaryotic cell. Examples of such 
vectors include but are not limited to vaccinia virus 
vectors, adenovirus or herpes viruses. A preferred vector 
is the baculovirus transfer vector, pBlueBac . 

In yet another embodiment, the selected 
recombinant expression vector may then be transfected into 
a suitable eukaryotic cell system for purposes of 
expressing the recombinant protein. Such eukaryotic cell 
systems include but are not limited to cell lines such as 
HeLa, MRC-5 or CV-1. A preferred eukaryotic cell system is 
SF9 insect cells. 

The expressed recombinant protein may be detected 
by methods known in the art including, but not limited to, 
Coomassie blue staining and Western blotting. 

The present invention also relates to 
substantially purified and isolated recombinant El and core 
proteins. In one embodiment, the recombinant protein 
expressed by the SF9 cells can be obtained as a crude 
lysate or it can be purified by standard protein 
purification procedures known in the art which may include 
differential precipitation, molecular sieve chromatography. 
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ion-exchange chromatography, isoelectric focusing, gel 
electrophoresis and affinity and immunoaf f inity 
chromatography. The recombinant protein may be purified by 
passage through a column containing a resin which has bound 
thereto antibodies specific for the open reading frame 
(ORF) protein. 

The present invention further relates to the use 
of recombinant El and core proteins as diagnostic agents 
and vaccines. In one embodiment, the expressed recombinant 
proteins of this invention can be used in immunoassays for 
diagnosing or prognosing hepatitis C in a mammal . For the 
purposes of the present invention, "mammal" as used 
throughout the specification and claims, includes, but is 
not limited to humans, chimpanzees, other primates and the 
like. In a preferred embodiment, the immunoassay is useful 
in diagnosing hepatitis C infection in humans. 

Immunoassays of the present invention may be 
those commonly used by those skilled in the art including, 
but not limited to, radioimmunoassay, Western blot assay, 
immunof luorescent assay, enzyme immunoassay, 
chemiluminescent assay, immunohistochemical assay, 
immunoprecipitation and the like. Standard techniques 
known in the art for ELISA are described in Methods rn 
Immunodiaqnosis . 2nd Edition, Rose and Bigazzi, eds . , John 
Wiley and Sons, 1980 and Campbell et al . , Methods of 
Immunology , W.A. Benjamin, Inc., 1964, both of which are 
incorporated herein by reference. Such assays may be a 
direct, indirect, competitive, or noncompetitive 
immunoassay as described in the art (Oellerich, M. 1984. J_ 
Clin. Chem. Clin. BioChem 22:895-904) Biological samples 
appropriate for such detection assays include, but are not 
limited to serum, liver, saliva, lymphocytes or other 
mononuclear cells. 

In a preferred embodiment, test serum is reacted 
with a solid phase reagent having surface-bound recombinant 
HCV El and/or core protein (s) as antigen (s) . The solid 
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surface reagent can be prepared by known techniques for 
attaching protein to solid support material. These 
attachment methods include non-specific adsorption of the 
protein to the support or covalent attachment of the 
protein to a reactive group on the support. After reaction 
of the antigen with anti-HCV antibody, unbound serum 
components are removed by washing and the antigen- antibody 
complex is reacted with a secondary antibody such as 
labelled anti -human antibody. The label may be an enzyme 
which is detected by incubating the solid support in the 
presence of a suitable fluorimetric or calorimetric 
reagent. Other detectable labels may also be used, such as 
radiolabels or colloidal gold, and the like. 

The HCV El and/or core proteins and analogs 
thereof may be prepared in the form of a kit, alone, or in 
combinations with other reagents such as secondary 
antibodies, for use in immunoassays. 

In yet another embodiment the recombinant El and 
core proteins or analogs thereof can be used as a vaccine 
to protect mammals against challenge with hepatitis C. The 
vaccine, which acts as an immunogen, may be a cell, cell 
lysate from cells transfected with a recombinant expression 
vector or a culture supernatant containing the expressed 
protein. Alternatively, the immunogen is a partially or 
substantially purified recombinant protein. In yet another 
embodiment, the immunogen may be a fusion protein 
comprising core protein and a second, non-core protein 
joined together such that the core portion of the fusion 
protein will aggregate and "trap" the second protein on the 
surface of the particle produced by aggregation of the core 
protein. (Molecular Biology of the Hepatitis B Virus", 
McLachlan, A. (1991) CRC Press, Boca Raton, Fla.) . 
Alternatively, the core protein could be mixed with the 
second protein in vitro to produce particles in which all 
or part of the second protein was exposed on the surface of 
the particle. Such particles would then serve as a carrier 
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in a multi -valent vaccine preparation. Second proteins or 
parts thereof which could be mixed with or fused to the 
core protein include, but are not limited to, HCV El and 
hepatitis B surface antigen. 

While it is possible for the immunogen to be 
administered in a pure or substantially pure form, it is 
preferable to present it as a pharmaceutical composition, 
formulation or preparation. 

The formulations of the present invention, both 
for veterinary and for human use, comprise an immunogen as 
described above, together with one or more pharmaceutically 
acceptable carriers and optionally other therapeutic 
ingredients. The carrier (s) must be "acceptable" in the 
sense of being compatible with the other ingredients of the 
formulation and not deleterious to the recipient thereof. 
The formulations may conveniently be presented in unit 
dosage form and may be prepared by any method well-known in 
the pharmaceutical art. 

All methods include the step of bringing into 
association the active ingredient with the carrier which 
constitutes one or more accessory ingredients. In general, 
the formulations are prepared by uniformly and intimately 
bringing into association the active ingredient with liquid 
carriers or finely divided solid carriers or both, and 
then, if necessary, shaping the product into the desired 
formulation . 

Formulations suitable for intravenous 
intramuscular, subcutaneous, or intraperitoneal 
administration conveniently comprise sterile aqueous 
solutions of the active ingredient with solutions which are 
preferably isotonic with the blood of the recipient. Such 
formulations may be conveniently prepared by dissolving the 
solid active ingredient in water containing physiologically 
compatible substances such as sodium chloride (e.g. 0.1- 
2.0m), glycine, and the like, and having a buffered pH 
compatible with physiological conditions to produce an 
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aqueous solution, and rendering said solution sterile. 

These may be present in unit or multi-dose containers, for 
example, sealed ampules or vials. 

The formulations of the present invention may 
incorporate a stabilizer. Illustrative stabilizers are 
preferably incorporated in an amount of 0.10-10,000 parts 
by weight per part by weight of immunogens. If two or more 
stabilizers are to be used, their total amount is 
preferably within the range specified above. These 
stabilizers are used in aqueous solutions at the 
appropriate concentration and pH. The specific osmotic 
pressure of such aqueous solutions is generally in the 
range of 0.1 -3.0 osmoles, preferably in the range of 0.8- 
1.2. The pH of the aqueous solution is adjusted to be 
within the range of 5. 0-9.0, preferably within the range of 
6-8. In formulating the immunogen of the present 
invention, an anti-adsorption agent may be used. 

Additional pharmaceutical methods may be employed 
to control the duration of action. Controlled release 
preparations may be achieved through the use of polymer to 
complex or adsorb the proteins or their derivatives. The 
controlled delivery may be exercised by selecting 
appropriate macromolecules (for example polyester, 
polyamino acids, polyvinyl pyrrolidone, 
ethylenevinylacetate , methylcellulose , 

carboxymethylcellulose, or protamine sulfate) and the 
concentration of macromolecules as well as the methods of 
incorporation in order to control release. Another 
possible method to control the duration of action by 
controlled-release preparations is to incorporate the 
proteins, protein analogs or their functional derivatives, 
into particles of a polymeric material such as polyesters, 
polyamino acids, hydrogels, poly (lactic acid) or ethylene 
vinylacetate copolymers. Alternatively, instead of 
incorporating these agents into polymeric particles, it is 
possible to entrap these materials in microcapsules 
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prepared, for example, by coacervation techniques or by 
interfacial polymerization, for example, 

hydroxymethylcellulose or gelatin-microcapsules and poly 
(methylmethacylate) microcapsules, respectively, or in 
colloidal drug delivery systems, for example, liposomes, 
albumin microspheres, microemulsions, nanoparticles, and 
nanocapsules or in macroemulsions. 

When oral preparations are desired, the 
compositions may be combined with typical carriers, such as 
lactose, sucrose, starch, talc, magnesium stearate, 
crystalline cellulose, methyl cellulose, carboxymethyl 
cellulose, glycerin, sodium alginate or gum arabic among 
others . 

The El and core proteins of the present invention 
may also be used as a delivery system for anti-virals to 
prevent or attenuate HCV infection in a mammal by utilizing 
the property of both proteins to self -aggregate in vitro to 
"trap" the antiviral within the particles produced via 
aggregation of the core and El proteins. Examples of anti- 
virals which could be delivered by such a system include, 
but are not limited to antisense DNA or RNAs . 

Vaccination can be conducted by conventional 
methods. For example, the immunogen or immunogens (e.g. 
the El protein may be administered alone or in combination 
with the El proteins derived from other isolates of HCV) 
can be used in a suitable diluent such as saline or water, 
or complete or incomplete adjuvants. Further, the 
immunogen (s) may or may not be bound to a carrier to make 
the protein (s) immunogenic. Examples of such carrier 
molecules include but are not limited to bovine serum 
albumin (BSA) , keyhole limpet hemocyanin (KLH) , tetanus 
toxoid, and the like. The immunogen (s) can be administered 
by any route appropriate for antibody production such as 
intravenous, intraperitoneal , intramuscular, subcutaneous, 
and the like. The immunogen (s) may be administered once or 
at periodic intervals until a significant titer of anti-HCV 
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antibody is produced. The antibody may be detected in the 
serum using an immunoassay. 

In yet another embodiment, the immunogen may be 
nucleic acid sequence capable of directing host organism 
synthesis of El and/or core protein (s). Such nucleic acid 
sequence may be inserted into a suitable expression vector 
by methods known to those skilled in the art. Expression 
vectors suitable for producing high efficiency gene 
transfer in vivo include retroviral, adenoviral and 
vaccinia viral vectors. Operational elements of such 
expression vectors are disclosed previously in the present 
specification and are known to one skilled in the art. 

Such expression vectors can be administered intravenously, 
intramuscularly, subcutaneously, intraperitoneally or 
orally. 

In an alternative embodiment, direct gene 
transfer may be accomplished via intramuscular injection 
of, for example, plasmid-based eukaryotic expression 
vectors containing a nucleic acid sequence capable of 
directing host organism synthesis of El and/or core 
protein (s) . Such an approach has previously been utilized 
to produce the hepatitis B surface antigen in vivo and 
resulted in an antibody response to the surface antigen 
(Davis, H.L. et al . (1993) Human molecular Genetics , 

2:1847-1851; see also Davis et al . (1993) Human Gene 

Therapy . 4:151-159 and 733-740). 

Doses of El and/or core protein (s) -encoding 
nucleic acid sequence effective to elicit a protective 
antibody response against HCV infection range from about 1 
to about 500 n g. A more preferred range being about 1 to 
about 50 0 jug. 

The El and/or core proteins and expression 
vectors containing a nucleic acid sequence capable of 
directing host organism synthesis of El and/or core 
protein (s) may be supplied in the form of a kit, alone, or 
in the form of a pharmaceutical composition as described 
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above . 

The administration of the immunogen (s) of the 
present invention may be for either a prophylactic or 
therapeutic purpose. When provided prophylact ically , the 
immunogen (s) is provided in advance of any exposure to HCV 
or in advance of any symptom of any symptoms due to HCV 
infection. The prophylactic administration of the 
immunogen serves to prevent or attenuate any subsequent 
infection of HCV in a mammal . When provided 
therapeutically, the immunogen (s) is provided at (or 
shortly after) the onset of the infection or at the onset 
of any symptom of infection or disease caused by HCV. The 
therapeutic administration of the immunogen (s) serves to 
attenuate the infection or disease. 

In addition to use as a vaccine, the compositions 
can be used to prepare antibodies to HCV El and core 
proteins. The antibodies can be used directly as antiviral 
agents or they may be used in immunoassays disclosed herein 
to detect HCV El and core proteins present in patient 
sera.. To prepare antibodies, a host animal is immunized 
using the El and/or core proteins native to the virus 
particle bound to a carrier as described above for 
vaccines. The host serum or plasma is collected following 
an appropriate time interval to provide a composition 
comprising antibodies reactive with the El or core protein 
of the virus particle. The gamma globulin fraction or the 
IgG antibodies can be obtained, for example, by use of 
saturated ammonium sulfate or DEAE Sephadex, or other 
techniques known to those skilled in the art. The 
antibodies are substantially free of many of the adverse 
side effects which may be associated with other anti-viral 
agents such as drugs. 

The antibody compositions can be made even more 
compatible with the host system by minimizing potential 
adverse immune system responses. This is accomplished by 
removing all or a portion of the Fc portion of a foreign 


372577_ 



31 


5 


10 


15 


20 


25 


30 


35 


species antibody or using an antibody of the same species 
as the host animal, for example, the use of antibodies from 
human/human hybridomas. Humanized antibodies (i.e., 
nonimmunogenic in a human) may be produced, for example, by 
replacing an immunogenic portion of an antibody with a 
corresponding, but nonimmunogenic portion (i.e., chimeric 
antibodies) . Such chimeric antibodies may contain the 
reactive or antigen-binding portion of an antibody from one 
species and the Fc portion of an antibody (nonimmunogenic) 
from a different species. Examples of chimeric antibodies, 
include but are not limited to, non-human mammal-human 
chimeras, rodent -human chimeras, murine -human and rat -human 
chimeras (Robinson et al . , International Patent Application 
184,187; Taniguchi M., European Patent Application 171,496; 
Morrison et al . , European Patent Application 173,494; 
Neuberger et al . , PCT Application WO 86/01533; Cabilly et 
al., 1987 Proc. Natl. Acad. Sci . USA 84:3439; Nishimura et 
al . , 1987 Cane. Res. 47:999; Wood et al . , 1985 Nature 

314:446; Shaw et al . , 1988 J. Natl. Cancer Inst. 80:15553, 

all incorporated herein by reference) . 

General reviews of "humanized" chimeric 
antibodies are provided by Morrison S., 1985 Science 
229:1202 and by Oi et al . , 1986 BioTechniques 4:214. 

Suitable "humanized" antibodies can be 
alternatively produced by CDR or CEA substitution (Jones et 
al . , 1986 Nature 321:552; Verhoeyan et al . , 1988 Science 

239:1534; Biedleret al . 1988 J. Immunol. 141:4053, all 
incorporated herein by reference) . 

The antibodies or antigen binding fragments may 
also be produced by genetic engineering. The technology 
for expression of both heavy and light cain genes in E . 
coli is the subject of the PCT patent applications; 
publication number WO 901443, WO901443, and WO 9014424 and 
in Huse et al . , 1989 Science 246:1275-1281. 

The antibodies can also be used as a means of 
enhancing the immune response. The antibodies can be 
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administered in amount similar to those used for other 
therapeutic administrations of antibody. For example, 
normal immune globulin is administered at 0.02-0.1 ml/lb 
body weight during the early incubation period of other 
viral diseases such as rabies, measles, and hepatitis B to 
interfere with viral entry into cells. Thus, antibodies 
reactive with the HCV El and/or core proteins can be 
passively administered alone or in conjunction with another 
anti -viral agent to a host infected with an HCV to enhance 
the immune response and/or the effectiveness of an 
antiviral drug. 

Alternatively, anti -HCV El antibodies and anti- 
HCV core antibodies can be induced by administered anti- 
idiotype antibodies as immunogens. Conveniently, a 
purified anti-HCV El or anti-HCV core antibody preparation 
prepared as described above is used to induce anti-idiotype 
antibody in a host animal, the composition is administered 
to the host animal in a suitable diluent. Following 
administration, usually repeated administration, the host 
produces anti-idiotype antibody. To eliminate an 
immunogenic response to the Fc region, antibodies produced 
by the same species as the host animal can be used or the 
Fc region of the administered antibodies can be removed. 
Following induction of anti-idiotype antibody in the host 
animal, serum or plasma is removed to provide an antibody 
composition. The composition can be purified as described 
above for anti-HCV El and anti-HCV core antibodies, or by 
affinity chromatography using anti-HCV El or anti-HCV core 
antibodies bound to the affinity matrix. The anti-idiotype 
antibodies produced are similar in conformation to the 
authentic HCV El or core protein and may be used to prepare 
an HCV vaccine rather than using an HCV El or core protein. 

When used as a means of inducing anti-HCV virus 
antibodies in an animal, the manner of injecting the 
antibody is the same as for vaccination purposes, namely 
intramuscularly, intraperitoneally , subcutaneously or the 
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like in an effective concentration in a physiologically 
suitable diluent with or without adjuvant. One or more 
booster injections may be desirable. 

The HCV El and core proteins of the invention are 
also intended for use in producing antiserum designed for 
pre- or post-exposure prophylaxis. Here an El or core 
protein, or mixture of El and/or core proteins is 
formulated with a suitable adjuvant and administered by 
injection to human volunteers, according to known methods 
for producing human antisera. Antibody response to the 
injected proteins is monitored, during a several-week 
period following immunization, by periodic serum sampling 
to detect the presence of anti -HCV El and/or anti-HCV core 
serum antibodies, using an immunoassay as described herein. 

The antiserum from immunized individuals may be 
administered as a pre-exposure prophylactic measure for 
individuals who are at risk of contracting infection. The 
antiserum is also useful in treating an individual post- 
exposure, analogous to the use of high titer antiserum 
against hepatitis B virus for post -exposure prophylaxis. 

For both in vivo use of antibodies to HCV virus - 
like particles and proteins and anti-idiotype antibodies 
and diagnostic use, it may be preferable to use monoclonal 
antibodies. Monoclonal anti-HCV El and anti-HCV core 
protein antibodies or anti-idiotype antibodies can be 
produced as follows. The spleen or lymphocytes from an 
immunized animal are removed and immortalized or used to 
prepare hybridomas by methods known to those skilled in the 
art. (Goding, J.W. 1983. Monoclonal Antibodies: 

Principles and Practice, Pladermic Press, Inc., NY, NY, pp . 
56-97) . To produce a human- human hybridoma, a human 
lymphocyte donor is selected. A donor known to be infected 
with HCV (where infection has been shown for example by the 
presence of anti-virus antibodies in the blood or by virus 
culture) may serve as a suitable lymphocyte donor. 
Lymphocytes can be isolated from a peripheral blood sample 
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or spleen cells may be used if the donor is subject to 
splenectomy. Epstein-Barr virus (EBV) can be used to 
immortalize human lymphocytes or a human fusion partner can 
be used to produce human-human hybridomas . Primary in 
vitro immunization with peptides can also be used in the 
generation of human monoclonal antibodies. 

Antibodies secreted by the immortalized cells are 
screened to determine the clones that secrete antibodies of 
the desired specificity. For monoclonal anti-El and anti- 
core antibodies, the antibodies must bind to HCV El and 
core proteins respectively. For monoclonal anti-idiotype 
antibodies, the antibodies must bind to anti-El and anti- 
core protein antibodies respectively. Cells producing 
antibodies of the desired specificity are selected. 

The present invention also relates to the use of 
single-stranded antisense poly- or oligonucleotides derived 
from nucleotide sequences substantially homologous to those 
shown in SEQ ID N0s:l-51 to inhibit the expression of 
hepatitis C El genes. The present invention further 
relates to the use of single-stranded anti-sense poly- or 
oligo-nucleotides derived from nucleotide sequences 
substantially homologous to those shown in SEQ ID NOs:103- 
154 to inhibit the expression of hepatitis C core genes. 
Alternatively, the anti-sense poly- or oligo-nucleotides 
may be complementary to both the El and core genes and 
hence, inhibit the expression of both hepatitis C El and 
core genes. By substantially homologous as used throughout 
the specification and claims to describe the nucleic acid 
sequences of the present invention, is meant a level of 
homology between the nucleic acid sequence and the SEQ ID 
NOs . referred to in the above sentence. Preferably, the 
level of homology is in excess of 80%, more preferably in 
excess of 90%, with a preferred nucleic acid sequence being 
in excess of 95% homologous with the DNA sequence shown in 
the indicated SEQ ID NO. These anti -sense poly- or 
oligonucleotides can be either DNA or RNA . The targeted 



sequence is typically messenger RNA and more preferably, a 
single sequence required for processing or translation of 
the RNA. The anti-sense poly- or oligonucleotides can be 
conjugated to a polycation such as polylysine as disclosed 
in Lemaitre, M. et al . ((1989) Proc . Natl. Acad. Sci. USA 

84:648-652) and this conjugate can be administrated to a 
mammal in an amount sufficient to hybridize to and inhibit 
the function of the messenger RNA. 

The present invention further relates to multiple 
computer-generated alignments of the nucleotide and deduced 
amino acid sequences shown in SEQ ID NOs : 1-206. Computer 
analysis of the nucleotide sequences shown in SEQ ID N0s:l- 
51 and 103-154 and of the deduced amino acid sequences 
shown in SEQ ID NOs: 52-102 and 155-206 can be carried out 
using commercially available computer programs known to one 
skilled in the art. 

In one embodiment, computer analysis of SEQ ID 
NOs: 1-51 by the program GENALIGN ( Intelligenetics , Inc. 
Mountainview, CA) results in distribution of the 51 HCV El 
sequences into twelve genotypes based upon the degree of 
variation of the sequences. For the purposes of the 
present invention, the nucleotide sequence identity of El 
cDNAs of HCV isolates of the same genotype is in the range 
of about 85% to about 100% whereas the identity of El cDNA 
sequences of different genotypes is in the range of about 
50% to about 80%. 

The grouping of SEQ ID NOs: 1-51 into twelve HCV 
genotypes is shown below. 
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SEO ID NOs: 

Genotvoes 

1-8 

I/la 

9-25 

I I/lb 

26-29 

III/2a 

30-33 

IV/2b 

34 

2c 

35-39 

V/3a 

40 

4a 

41 

4b 

42-43 

4c 

44 

4d 

45-50 

5a 

51 

6a 


For those genotypes containing more than one El 
nucleotide sequence, computer alignment of the constituent 
nucleotide sequences of the genotype was conducted using 
GENALIGN in order to produce a consensus sequence for each 
genotype. These alignments and their resultant consensus 
15 sequences are shown in Figures 1A-G for the seven genotypes 
( I/la, Il/lb, III/2a, IV/2b, V/3a, 4c and 5a) which 
comprise more than one nucleotide sequence. Further 
alignment of the consensus sequences of Figures 1A-G with 
SEQ ID NO : 34 (genotype 2c), SEQ ID NO:40 (genotype 4a), SEQ 
20 id NO : 4 1 (genotype 4b), SEQ ID NO:44 (genotype 4d) and SEQ 
ID NO: 51 (genotype 6a) produces a consensus sequence for 
all twelve genotypes as shown in Figure 1H. The multiple 
alignments of nucleotide sequences shown in Figures 1A-H 
produce consensus sequences which serve to highlight 
25 regions of homology and non -homology between sequences 

found within the same genotype or in different genotypes 
and hence, these alignments can be used by one skilled in 
the art to design oligonucleotides useful as reagents in 
diagnostic assays for HCV. 

30 Examples of purified and isolated oligonucleotide 

sequences derived from the consensus sequences shown in 
Figures 1A-H include, but are not limited to, SEQ ID 
NOs : 213-239 where these oligonucleotides are useful as 
"genotype- specif ic" primers and probes since these 
35 


372577_1 



oligonucleotides can hybridize specifically to the 
nucleotide sequence of the El gene of HCV isolates 
belonging to a single genotype. The genotype-specificity 
of the oligonucleotides shown in SEQ ID NOs: 213-239 is as 
follows: SEQ ID NOs: 213 -214 are specific for genotype 

I/la; SEQ ID NOs: 215-216 are specific for genotype Il/lb; 
SEQ ID NOs: 217-218 are specific for genotype III/2a; SEQ ID 
NOs : 219-220 are specific for genotype IV/2b; SEQ ID 
NOs : 221-223 are specific for genotype 2c; SEQ ID NOs:224- 
226 are specific for genotype V/3a; SEQ ID NOs : 227-228 are 
specific for genotype 4a; SEQ ID NOs: 229-230 are specific 
for genotype 4b; SEQ ID NOs: 231-232 are specific for 
genotype 4c ; SEQ ID NOs: 233 -234 are specific for genotype 
4d; SEQ ID NOs : 235-236 are specific for genotype 5a and SEQ 
ID NOs : 237-239 are specific for genotype 6a. 

In another embodiment, the computer analysis of 
SEQ ID NOs: 103-154 by the program GENALIGN results in 
distribution of the 52 HCV core sequences into 14 genotypes 
based upon the degree of variation of the sequences. 

The grouping of SEQ ID NOs : 103 -154 into 14 HCV 
genotypes is shown below. 


SEO ID NOs: 

Genotypes 

103-108 

I/la 

109-124 

I I/lb 

125-128 

III/2a 

129-133 

IV/2b 

134 

2c 

135-138 

V/3a 

139 

4a 

141 

4b 

143 

4c 

144 

4c 

145 

4d 

142 

4e 

140 

4f 

146-153 

5a 

154 

6a 


These 14 genotypes can be further grouped into 6 



major genotypes designated genotypes 1-6 where genotype 1 
comprises the sequences contained in minor genotypes I/la 
and Il/lb; genotype 2 comprises the sequences contained in 
minor genotypes III/2a, IV/2b and 2c ; genotype 3 comprises 
sequences contained in genotype V/3a; genotype 4 comprises 
sequences contained in minor genotypes 4a-4f; genotype 5 
comprises the sequences contained in genotype 5a and 
genotype 6 comprises the sequence contained in genotype 6a. 
Computer alignment of the constituent nucleotide sequences 
of the core cDNAs falling within genotypes I/la, II/lb ; 
IIl/2a, IV/2b, V/3a and 5a, to produce a consensus sequence 
for each of these genotypes is shown in Figures 6A (I/la) , 
6B (Il/lb) , 6D ( IIl/2a) , 6E (IV/2b), 6G (V/3a) and 61 (5a). 
The alignment of the sequences found in minor genotypes 
I/la and Il/lb to produce a consensus sequence for major 
genotype 1 is shown in Figure 6C. The alignment of the 
sequences contained in minor genotypes IIl/2a, IV/2b and 2c 
to produce a consensus sequence for major genotype 2 is 
shown in Figure 6F. The alignment of the nucleotide 
sequences contained in minor genotypes 4a-4f to produce a 
consensus sequence for major genotype 4 is shown in Figure 
6H. Further alignment of the consensus sequences shown in 
Figures 6C, 6F, 6G, 6H and 61 with SEQ ID NO: 154 (genotype 
6a/major genotype 6) to produce a consensus sequence for 
all genotypes is shown in Figure 6J and alignment of the 
consensus sequences shown in Figures 6A, 6B, 6D, 6E, 6G and 
61 with 4a), SEQ ID N0:141 (genotype 4b), SEQ ID NO:143 
(genotype 4c), SEQ ID NO:145 (genotype 4d) , SEQ ID NO:142 
(genotype 4e) , SEQ ID NO:140 (genotype 4f) and SEQ ID 
NO: 154 (genotype 6a) to produce a consensus sequence for 
all fourteen genotypes is shown in Figure 6K. As with the 
alignments of the envelope (El) nucleotide sequences, the 
consensus sequences shown in Figures 6A-6K serve to 
highlight regions of homology and non-homology between 
sequences found within the same genotype or in different 
genotypes and hence, can be used by one skilled in the art 
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° to design oligonucleotides useful as reagents in diagnostic 
assays for HCV. 

For example, purified and isolated 
oligonucleotide sequences derived from the consensus 
sequences shown in Figures 6A-6K may be useful as genotype- 
5 specific primers and probes since these oligonucleotides 
can hybridize specifically to the nucleotide sequence of 
the core gene of HCV isolates belonging to a given 
genotype. Examples of regions of the consensus sequence of 
the core gene of a given genotype from which primers 
10 specific for that genotype may be deduced include but are 

not limited to, the nucleotide domains shown below for each 
genotype. The sequence in which the indicated nucleotide 
domains are found are indicated in parentheses to the right 
of each genotype . 

15 Genotype 1 (Consensus Sequence of Figure 60 
427-466, 444-483, 447-486 (5'-3', sense) 

505-466, 522-483, 525-486 (5' -3', antisense) 

Genotype la (Consensus Sequence of Figure 6A) 

20 141-180, 279-318 (5'-3', sense) 

219-180, 246-207 (5'-3', antisense) 


Genotype lb (Consensus Sequence of Figure 6B) 

67-106, 127-186, 234-273 (5'-3', sense) 

144-106, 225-186, 311-272, 312-273 (5' -3', antisense) 

Genotype 2 (Consensus Sequence of Figure 6F) 

153-192, 162-201, 164-203, 168-207, 171-210, 182-221, 192- 

231, 193-232, 302-341 (5'-3', sense) 

231-192, 240-201, 242-203, 246-207, 249-210, 260-221, 270- 
231, 271-232, 380-341 (5' -3', antisense) 

Genotype III /2a (Consensus Sequence of Figure 6D) 

276-315, 306-355 (5'-3', sense) 

309-270, 354-315, 394-355, 571-532 (5' -3', antisense) 

3725771 
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° Genotype IV/2b (Consensus Sequence of Figure 6E) 

6-45, 135-174, 177-216, 309-348, 337-376, 375-414, 501-540 

(5 ' -3 ' , sense) 

84-45, 213-174, 255-216, 387-348, 415-376, 453-414, 571- 
532, 573-540 (5' -3', antisense) 

5 

Genotype 2c (SEP ID NO: 134) 

194-233, 273-312, 279-318, 417-456, 423-462, 504-543, SOS- 
544, 517-556 ( 5 ' - 3 ' , sense) 

272-233, 351-312, 354-315, 357-318, 450-411, 495-456, 501- 
10 462, 573-543, 556-573 (5' -3', antisense) 


Genotype 3 or Genotype V/3a (Consensus Sequence of Figure 
6G) 

8-47, 45-84, 68-107, 87-126, 88-127, 90-129, 111-150, 142- 

181, 173-212, 177-216, 261-300, 

276-315, 452-491, 520-559, 521-560, 529-568, 532-571, 533- 

572 , (5 ' -3 ' , sense) 

86-47, 123-84, 146-107, 165-126, 186-147, 189-150, 219-180 

250-211, 251-212, 255-216, 

339-300, 530-491, 573-543, 573-557, 573-559, 573-560. (5'- 

3 ' , antisense) 


Genotype 4 (Consensus Sequence of Figure 6H) 

20-59 (5'-3', sense) 

97-58, 98-59 (5' -3', antisense) 

Genotype 4a (SEP ID NO: 13 9) 

111-150, 150-189, 174-213, 183-222, 192-231, 261-300, 376- 
415, 396-435, 531-570 (5'-3', sense) 

186-147, 252-213, 270 -231, 339-300, 454-415 (5'-3', 

antisense) 


Genotype 4b (SEP ID NO: 141) 

27-66, 30-69, 106-145, 271-310, 433-472, 447-486, 453-492 
(5 ' -3 ' , sense) 
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° 105-66, 183-144, 184-145, 345-306, 348-309, 349-310, 468- 

429, 510-471, 522-483, 570-531 (5'-3', antisense) 

Genotype 4c (SEP ID NO: 143 

174-213, 180-219, 207-246, 231-270 (5'-3', sense) 

5 249-210, 252-213, 258-219, 309-270, 504-465 (5'-3', 

antisense) 

Genotype 4d (SEP ID NO: 145) 

173-212, 188-327, 430-469 (5'-3', sense) 

10 248-209, 249-210, 250-211, 251-212, 366-327, 508-469 (5'- 

3 ' , antisense) 

Genotype 4e (SEP ID NO: 142) 

160-199, 267-306, 287-326, 288-327, 524-564 (5'-3', sense 

15 238-199, 345-306, 365-326, 216-177, 522-483 (5'-3', 

antisense) 

Genotype 4f (SEP ID NO: 140) 

18-57, 36-75, 228-267, 396-435 (5' -3', sense) 

20 96-57, 114-75, 306-267 (5' -3', antisense) 

Genotype 5 or 5a (Consensus Sequence of Figure 61) 
176-215, 177-216, 181-220, 195-234, 221-260, 252-291, 255 

294, 396-435, 435-474, 447-486, 498-537 (5'-3', sense) 

25 254-215, 299-260, 310-271, 330-291, 333-294, 354-315, 464 

425, 471-432, 483-444, 570-531 (5' -3', antisense) 

Genotype 6 or 6a (SEP ID NO: 154) 

20-59, 136-175, 156-195, 159-198, 175-214, 185-224, 277- 

316, 278-317, 312-351, 348-387,405-444, 406-445, 407-446, 
408-447, 411-450, 432-471, 433-472, 435-474, 522-561 (5'- 

3 ' , sense) . 

98-59, 214-175, 234-195, 237-198, 253-214, 262-223, 263- 
224, 354-315, 355-316, 382-343, 390-351, 426-387, 468-429 
483-444, 484-445, 485-446, 486-447, 489-450, 510-471, 511 


372577J 



42 


5 


10 


15 


20 


25 


30 


35 


472, 513-474 (5' -3', antisense) 

Such nucleotide domains may range from about 15 
to about 100 bases in length with a more preferred range 
being about 30 to about 60 bases in length. 

In an alternative embodiment, universal primers 
able to hybridize to the nucleotide sequences of the core 
gene of HCV isolates belonging to all of the genotypes 
disclosed herein may be deduced from universally conserved 
nucleotide domains of the consensus sequence shown in 
Figures 6J and 6K. Examples of such nucleotide domains 
include, but are not limited to, those shown below: 

nucleotides 1-20, 1-25, 1-26, 1-27, 1-33, 50-89, 

51-90, 52-91, 53-92, 61-100, 62-101, 77-116, 78-117, 79- 
118, 80-119, 81-120, 82-121, 83-122, 84-123, 85-124, 86- 
125, 97-136, 98-137, 99-138, 100-139, 101-140, 102-141, 

329-368, 330-369, 331-370, 332-371, 354-393, 355-394, 356- 

395, 362-401, 363-402, 364-403, 365-404, 369-408, 442-481, 
443-482, 457-496, 458-497, 475-514, 476-515, 477-516 (5'- 

3, sense); and 

nucleotides 40-1, 41-2, 42-3, 43-4, 51-12, 52-13, 
55-16, 56-17, 57-18, 58-19, 61-22, 62-23, 63-24, 

64-25, 70-31, 124-85, 125-86, 126-87, 127-88, 128-89, 129- 

90, 136-97, 137-98, 138-99, 

149-110, 150-111, 151-112, 152-113, 153-114, 154-115, 155- 

116, 156-117, 157-118, 158-119, 159-120, 170-131, 171-132, 
172-133, 173-134, 174-135, 175-136, 403-364, 405-365, 406- 

366, 406-367, 430-391, 431-392, 432-393, 436-397, 437-398, 
438-399, 439-400, 517-478, 518-479, 519-480, 532-493, 533- 

494, 550-511, 551-512 (5' -3', antisense) 

Those skilled in the art would readily understand 
that the term "antisense" as used herein refers to primer 
sequences which are the complementary sequence of the 
indicated consensus sequence or SEQ ID NO:. Further, 
provided with the above examples of regions of the 
consensus sequences or indicated SEQ ID NOS: from which to 
deduce universal and genotype-specific primers, those 
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skilled in the art would readily be able to select pairs of 
primers, one sense and one antisense, which would be useful 
in the detection of HCV genotypes via the PCR methods 
described herein. 

In yet another embodiment, the sequences shown in 
SEQ ID NO . : 1 03 - 154 and the resultant consensus sequences 
produced by alignment of these SEQ ID NOs as shown in 
Figures 6A-6K may also be useful in the design of 
hybridization probes specific for a given HCV genotype. 
Examples of nucleotide domains of the consensus sequence or 
SEQ ID NO of a given genotype from which genotype-specific 
hybridization probes may be deduced include, but are not 
limited to, those shown below where the sequence from which 
the domains are found is indicated in parentheses to the 
right of each genotype . 

Genotype Position 

la (Consensus sequence of Figure 6A) 50-85 

155-205 

207-277 

281-333 

429-477 

530-573 

lb (Consensus sequence of Figure 6B) 81-131 

159-225 

252-318 

411-472 

530-573 


25 


2a (Consensus sequence of Figure 6D) 


35-75 

200-276 

290-340 

330-380 

410-472 

530-573 


2b (Consensus sequence of Figure 6E) 

30 


20-70 

149-199 

191-241 

240-285 

261-318 

323-373 

351-401 

389-439 

429-477 


35 


372577J 





- 44 - 


- 



530-573 


2C (SEQ ID NO : 134 ) 


208-258 

230-276 

290-345 

411-460 

430-490 

530-573 

5 

3a (Consensus sequence of 

Figure 6G) 

1-50 

40-100 

100-160 

145-190 

190-240 

275-325 

10 



411-455 

466-516 

530-573 


4a (SEQ ID NO:139) 


35-85 

145-195 

200-250 

255-305 

15 



341-390 

390-440 

530-573 

20 

4b (SEQ ID NO: 141) 


35-85 

120-170 

180-225 

230-275 

285-335 

405-455 

462-492 

530-573 



4c (SEQ ID NO : 14 3 ) 


35-85 

190-246 

25 



245-295 

282-318 

372-415 

440-480 

530-573 

30 

4d (SEQ ID NO : 14 5 ) 


35-85 

187-237 

302-352 

405-455 

444-494 

530-573 


4e (SEQ ID NO: 142) 


35-85 

57-84 

35 
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174-224 

230-275 

290-340 

422-472 

530-573 


4 f (SEQ ID NO : 14 0 ) 


5 


35-85 

174-224 

242-292 

290-340 

422-472 

530-573 


5a (Consensus sequence of Figure 61 


10 


180-234 

265-315 

315-355 

420-486 

530-573 


6a (SEQ ID NO: 154) 


15 


34-84 

150-200 

180-230 

230-290 

291-333 

341-395 

429-490 

530-573 


1 (Consensus sequence of Figure 6C) 192-241 

435-495 


20 


2 (Consensus sequence of Figure 6F) 


186-240 

320-360 

440-475 


4 (Consensus sequence of Figure 6H) 40-80 

In yet another embodiment, universal 
^ hybridization probes may be derived from the consensus 
sequences shown in Figures 6J and 6K. Examples of 
nucleotide domains of the consensus sequences shown in 
Figure 6J and 6K from which universal hybridization probes 
may be derived include, but are not limited to, 1-33; 85- 
30 141; 364-408; 478-516. 

The oligonucleotides of this invention can be 
synthesized using any of the known methods of 
oligonucleotide synthesis (e.g., the phosphodiester method 
of Agarwal et al . 1972, Agnew. Chem. Int . Ed. Engl. 11:451, 

35 
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the phosphotriester method of Hsiung et al . 1979, Nucleic 

Acids Res 6:1371, or the automated diethylphosphoramidite 
method of Baeucage et al . 1981, Tetrahedron Letters 
22:1859-1862), or they can be isolated fragments of 
naturally occurring or cloned DNA. In addition, those 
skilled in the art would be aware that oligonucleotides can 
be synthesized by automated instruments sold by a variety 
of manufacturers or can be commercially custom ordered and 
prepared. In a preferred embodiment, the oligonucleotides 
of the present invention are synthetic oligonucleotides. 

The oligonucleotides of the present invention may range 
from about 15 to about 100 nucleotides; with the preferred 
sizes being about 20 to about 60 nucleotides; a more 
preferred size being about 25 to about 50 nucleotides; and 
a most preferred size being about 30 to about 40 
nucleotides . 

The present invention also relates to methods for 
detecting the presence of HCV in a mammal, said methods 
comprising analyzing the RNA of a mammal for the presence 
of hepatitis C virus. 

The RNA to be analyzed can be isolated from 
serum, liver, saliva, lymphocytes or other mononuclear 
cells as viral RNA, whole cell RNA or as poly (A) + RNA. 

Whole cell RNA can be isolated by methods known to those 
skilled in the art. Such methods include extraction of RNA 
by differential precipitation (Birnbiom, H.C. (1988) 

Nucleic Acids Res., 16:1487-1497), extraction of RNA by 
organic solvents ( Chomczynski , P. et al . (1987) Anal. 

Biochem., 162:156-159) and extraction of RNA with strong 
denaturants (Chirgwin, J.M. et al . (1979) Biochemistry, 

18:5294-5299). Poly(A) + RNA can be selected from whole cell 
RNA by affinity chromatography on oligo-d(T) columns (Aviv, 
H. et al . (1972) Proc . Natl. Acad. Sci., 69:1408-1412). A 

preferred method of isolating RNA is extraction of viral 
RNA by the guanidinium-phenol-chlorof orm method of Bukh et 
al . (1992a) . 
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The methods for analyzing the RNA for the 
presence of HCV include Northern blotting (Alwine, J.C. et 
al . (1977) Proc . Natl. Acad. Sci., 74:5350-5354), dot and 

slot hybridization (Kafatos, F.C. et al . (1979) Nucleic 

Acids Res., 7:1541-1522), filter hybridization (Hollander, 
M.C. et al . (1990) Biotechniques; 9:174-179), RNase 

protection (Sambrook, J. et al . (1989) in "Molecular 

Cloning, A Laboratory Manual", Cold Spring Harbor Press, 
Plainview, NY) and reverse- transcription polymerase chain 
reaction (RT-PCR) (Watson, J.D. et al . (1992) in 

"Recombinant DNA" Second Edition, W.H. Freeman and Company, 
New York) . 

A preferred method for analyzing the RNA is RT- 
PCR. In this method, the RNA can be reverse transcribed to 
first strand cDNA using a primer or primers derived from 
the nucleotide sequences shown in SEQ ID N0s:l-51 or SEQ ID 
NOs : 103 - 154 or sequences complementary to those described. 
Once the cDNAs are synthesized, PCR amplification is 
carried out using pairs of primers designed to hybridize 
with sequences in the HCV El or core cDNA which are an 
appropriate distance apart (at least about 50 nucleotides) 
to permit amplification of the cDNA and subsequent 
detection of the amplification product. Alternatively, one 
can amplify both El and core cDNA sequences by using a 
primer pair where one primer hybridizes with the El cDNA 
sequence and the other primer hybridizes with the core cDNA 
sequence. Each primer of a pair is a single-stranded 
oligonucleotide of about 20 to about 60 bases in length 
with a more preferred range being about 30 to about 50 
bases in length where one primer (the "upstream" primer) is 
complementary to the original RNA and the second primer 
(the "downstream" primer) is complementary to the first 
strand of cDNA generated by reverse transcription of the 
RNA. The target sequence is generally about 100 to about 
300 base pairs long but can be as large as 500-1500 base 
pairs. Optimization of the amplification reaction to 
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obtain sufficiently specific hybridization to the 
nucleotide sequence of interest (either El or core or both 
El and core) is well within the skill in the art and is 
preferably achieved by adjusting the annealing temperature. 

In one embodiment, the primer pairs selected to 
amplify El and core cDNAs are universal primers. By 
"universal", as used to describe primers throughout the 
claims and specification, is meant those primer pairs which 
can amplify El and/or core gene fragments derived from an 
HCV isolate belonging to any one of the genotypes of HCV 
described herein. Purified and isolated universal primers 
for El cDNAs are used in Example 1 of the present invention 
and are shown as SEQ ID NOs : 207-212 where SEQ ID NOs:207 
and 208 represent one pair of primers, SEQ ID NOs: 209 and 
210 represent a second pair of primers and SEQ ID NOs: 211- 
212 represent a third pair of primers. Nucleotide domains 
of the consensus sequence shown in Figure 6J from which 
universal primers for core cDNAs may be deduced have 
previously been disclosed within the present specification. 
Alternatively, a universal primer for El cDNA sequence and 
a universal primer for core cDNA sequence may be used as a 
universal primer pair to amplify both El and core cDNAs. 

In an alternative embodiment, primer pairs 
selected to amplify El and/or core cDNAs are genotype- 
specific primers. In the present invention, genotype- 
specific primer pairs can readily be derived from the 
following genotype-specific El nucleotide domains: 
nucleotides 197-238 and 450-480 of the consensus sequence 
of genotype 1/la shown in Figure 1A; nucleotides 197-238 
and 450-480 of the consensus sequence of genotype Il/lb 
shown in Figure IB; nucleotides 199-238 and 438-480 of the 
consensus sequence of genotype III/2a shown in Figure C; 
nucleotides 124-177 and 450-480 of the consensus sequence 
of genotype IV/2b shown in Figure ID; nucleotides 124-177, 
193-238 and 436-480 of SEQ ID NO:34 (genotype 2C) ; 
nucleotides 168-207, 294-339 and 406-480 of the consensus 
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sequence of genotype V/3a shown in Figure IE; nucleotides 
145-183 and 439-480 of SEQ ID NO:40 (genotype 4a) ; 
nucleotides 168-207 and 432-480 of SEQ ID N0:41 (genotype 
4b) ; nucleotides 130-183 and 450-480 of the consensus 
sequence of genotype 4c shown in Figure IF; nucleotides 
130-183 and 450-480 of SEQ ID N0:44 (genotype 4d) ; 
nucleotides 166-208 and 437-480 of the consensus sequence 
of genotype 5a shown in Figure lb and nucleotides 168-207, 
216-252 and 429-480 of SEQ ID NO: 51 (genotype 6a) . 

Genotype -specific HCV core nucleotide domains from which 
genotype-specific primers may be deduced have previously 
been described herein. Those skilled in the art would 
readily appreciate that in a pair of genotype-specific 
primers, each primer is derived from different nucleotide 
domains specific for a given genotype. Also, it is 
understood by those skilled in the art that each pair of 
primers comprises one primer which is complementary to the 
original viral RNA and the other which is complementary to 
the first strand of cDNA generated by reverse transcription 
of the viral RNA. For example, in a pair of genotype- 
specific primers for genotype 4b, one primer would have a 
nucleotide sequence derived from region 168-207 of SEQ ID 
NO: 40 and the other primer would have a nucleotide sequence 
which is the complement of region 432-480 of SEQ ID NO: 40. 
One skilled in the art would readily recognize that such 
genotype-specific domains would also be useful in designing 
oligonucleotides for use as genotype -specific hybridization 
probes. Indeed, genotype-specific hybridization probes 
deduced from the El and core sequences of the present 
invention have been previously disclosed herein. 

The amplification products of PCR can be detected 
either directly or indirectly. In one embodiment, direct 
detection of the amplification products is carried out via 
labelling of primer pairs. Labels suitable for labelling 
the primers of the present invention are known to one 
skilled in the art and include radioactive labels, biotin, 
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avidin, enzymes and fluorescent molecules. The derived 
labels can be incorporated into the primers prior to 
performing the amplification reaction. A preferred 
labelling procedure utilizes radiolabeled ATP and T4 
polynucleotide kinase (Sambrook, J. et al . (1989) in 

"Molecular Cloning, A Laboratory Manual", Cold Spring 
Harbor Press, Plainview, NY) . Alternatively, the desired 
label can be incorporated into the primer extension 
products during the amplification reaction in the form of 
one or more labelled dNTPs . In the present invention, the 
labelled amplified PCR products can be detected by agarose 
gel electrophoresis followed by ethidum bromide staining 
and visualization under ultraviolet light or via direct 
sequencing of the PCR-products . Thus, in one embodiment, 
the present invention relates to a method for determining 
the genotype of a hepatitis C virus present in a mammal 
where said method comprises: amplifying RNA of a mammal 

via RT- PCR using labelled genotype -specific primers for the 
amplification step of the cDNA produced by reverse 
transcription . 

In yet another embodiment, unlabelled 
amplification products can be detected via hybridization 
with labelled nucleic acid probes radioactively labelled 
or, labelled with biotin, in methods known to one skilled 
in the art such as dot and slot blot hybridization 
(Kafatos, F.C. et al . (1979) or filter hybridization 

(Hollander, M.C. et al . (1990)). 

In one embodiment, the nucleic acid sequences 
used as probes are selected from, and substantially 
homologous to, SEQ ID N0s:l-51 and/or SEQ ID NOs : 103-154. 
Such probes are useful as universal probes in that they can 
detect PCR- amplification products of El and/or core cDNAs 
of an HCV isolate belonging to any of the HCV genotypes 
disclosed herein. The size of these probes can range from 
about 200 to about 500 nucleotides. In an alternative 
embodiment, the sequence alignments shown in Figures 1A-1H 
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and 6A-6J may be used to design oligonucleotides useful as 
universal hybridization probes . Examples of core and 
envelope nucleotide domains from which such universal 
oligonucleotides may be deduced are disclosed herein. 

In yet another embodiment, the present invention 
relates to a method for determining the genotype of a 
hepatitis C virus present in a mammal where said method 
comprises : 

(a) amplifying RNA of a mammal via RT-PCR to 
produce amplification products; 

(b) contacting said products with at least one 
genotype -specific oligonucleotide; and 

(c) detecting complexes of said products which 
bind to said oligonucleotide (s) . 

In this method, one embodiment of said 
amplification step is carried out using the universal 
primers for El or core cDNAs as disclosed above. In step 
(b) of this method, the genotype- specif ic sequences used as 
probes may be deduced from the genotype-specific El and 
core nucleotide domains disclosed herein. These probes are 
useful in specifically detecting PCR-amplif ication products 
of El or core cDNAs of HCV isolates belonging to one of the 
HCV genotypes disclosed herein. In a preferred embodiment, 
these probes are used alone or in combination with other 
probes specific to the same genotype. 

For example, a probe having a sequence according 
to SEQ ID NO: 2 13 can be used alone or in combination with a 
probe having a sequence according to SEQ ID NO -.214. The 
probes used in this method can range in size from about 15 
to about 100 nucleotides with a more preferred range being 
about 30 to about 70 nucleotides. Such probes can be 
synthesized as described earlier. 

In an alternative embodiment, the genotype of the 
amplification product of step (a) may be determined by 
using the nucleic acid sequences shown in SEQ ID NOs : 1-51 

and 103-154 as probes (Delwart, E. et al . (1993)) Science , 
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262: 1257-1261). Probes utilized in the method of Delwart 
et al . may range in size from about 100 to about 1,000 
nucleotides with a more preferred probe size being about 
200 to about 800 base pairs and a most preferred probe size 
being about 300 to about 700 nucleotides. 

The nucleic acid sequence used as a probe to 
detect PCR amplification products of the present invention 
can be labeled in single-stranded or double-stranded form. 
Labelling of the nucleic acid sequence can be carried out 
by techniques known to one skilled in the art. Such 
labelling techniques can include radiolabels and enzymes 
(Sambrook, J. et al . (1989) in "Molecular Cloning, A 

Laboratory Manual", Cold Spring Harbor Press, Plainview, 

New York) . In addition, there are known non-radioactive 
techniques for signal amplification including methods for 
attaching chemical moieties to pyrimidine and purine rings 
(Dale, R.N.K. et al . (1973) Proc . Natl. Acad. Sci . , 

70:2238-2242; Heck, R.F. (1968) S. Am. Chem. Soc . , 90:5518- 

5523) , methods which allow detection by chemiluminescence 
(Barton, S.K. et al . (1992) J . Am . Chem . Soc . , 114:8736- 

8740) and methods utilizing biotinylated nucleic acid 
probes (Johnson, T.K. et al . (1983) Anal. Biochem. , 

133:126-131; Erickson, P.F. et al . (1982) J. of Immunology 

Methods, 51:241-249; Matthaei , F.S. et al . (1986) Anal . 

Biochem. . 157:123-128) and methods which allow detection by 

fluorescence using commercially available products. 

The present invention also relates to computer 
analysis of the amino acid sequences shown in SEQ ID 
NOs: 52-102 by the program GENALIGN. This analysis groups 
the 51 amino acid sequences shown in SEQ ID NOs: 52-102 into 
twelve genotypes based upon the degree of variation of the 
amino acid sequences. For the purposes of the present 
invention, the amino acid sequence identity of El amino 
acid sequences of the same genotype ranges from about 85% 
to about 100% whereas the identity of El amino acid 
sequences of different genotypes ranges from about 45% to 
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about 80%. 

The grouping of SEQ ID NOs : 52-102 into twelve HCV 
genotypes is shown below: 


5 


10 


SEP ID NOs : 

52-59 

60-76 

77-80 

81-84 

85 

86-90 

91 

92 

93-94 

95 

96-101 

102 


Genotypes 

i/la 
Il/lb 
III/2a 
IV/ 2b 
2c 

V/ 3a 

4a 

4b 

4c 

4d 

5a 

6a 


15 


20 


25 


30 


35 


For those genotypes containing more than one El 
amino acid sequence, computer alignment of the constituent 
sequences of each genotype was conducted using the computer 
program GENALIGN in order to produce a consensus sequence 
for each genotype. These alignments and their resultant 
consensus sequences are shown in Figures 2A-G for the seven 
genotypes (I/la, Il/lb, IIl/2a, IV/2b, V/3a, 4c and 5a) 
which comprise more than one sequence. Further alignment 
of the consensus sequences shown in Figures 2A-G with the 
amino acid sequences of SEQ ID NO: 85 (genotype 2c) ; SEQ ID 
NO : 91 (genotype 4a); SEQ ID NO:92 (genotype 4b); SEQ ID 
NO: 95 (genotype 4d) and SEQ ID NO: 102 (genotype 6a) to 
produce a consensus amino acid sequence for all twelve 
genotypes is shown in Figure 2H . The multiple alignment of 
El amino acid sequences shown in Figures 2A-H produces 
consensus sequences which serve to highlight regions of 
homology and non-homology between El amino acid sequences 
of the same genotype and of different genotypes and hence, 
these alignments can readily be used by those skilled in 
the art to design peptides useful in assays and vaccines 
for the diagnosis and prevention of HCV infection. 

In another embodiment, the computer analysis of 
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SEQ ID NOS: 155-206 by the probe genome results in 
distribution of the 52 HCV core sequences into 14 genotypes 
based upon identification of genotype-specific amino acid 
sequences . 

The grouping of SEQ ID NOS: 155-206 into 14 HCV 


genotypes is 
SEO ID NOS: 

shown below: 
Genotvoes 

155-160 

I/la 

161-176 

Il/lb 

177-180 

IIl/2a 

181-185 

IV/ 2b 

186 

2c 

187-190 

V/3a 

191 

4a 

193 

4b 

195 

4c 

196 

4c 

197 

4d 

194 

4e 

192 

4f 

198-205 

5a 

206 

6a 


20 


25 


30 


35 


These fourteen genotypes can be further grouped 
into six major genotypes designated genotypes 1-6 as 
described earlier for the core nucleotide sequences of the 
present application. Computer alignment of the amino acid 
sequences disclosed in SEQ ID NOS: 155-206 are shown in 
figures 7A-7J. As with the multiple alignments of the E-l 
amino acid sequences, the consensus sequences shown in 
figure 7A-7J serve to highlight regions of homology and 
nonhomology between core amino acid sequences of the same 
genotype and of different genotypes and hence, these 
alignments can readily be used by those skilled in the art 
to design peptides useful in assays and vaccines for the 
diagnosis and prevention of HCV infection. 

Examples of purified and isolated peptides 
deduced from the alignments shown in Figures 2A-2H include, 
but are not limited to, SEQ ID NOs : 240-263 wherein these 
peptides are derived from two regions of the amino acid 
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sequences shown in Figures 2A-H, amino acids 48-80 and 
amino acids 138-160. The peptides shown in SEQ ID NOs . 
240-263 are useful as genotype- specific diagnostic reagents 
since they are capable of detecting an immune response 
specific to HCV isolates belonging to a single genotype. 

The genotype-specificity of the peptides shown in SEQ ID 
NOs: 240-263 are as follows: SEQ ID NOs : 240 and 252 are 
specific for genotype IV/2b; SEQ ID NOs: 241 and 253 are 
specific for genotype 2c ; SEQ ID NOs: 242 and 254 are 
specific for genotype IIl/2a; SEQ ID NOs: 243 and 255 are 
specific for genotype V/a; SEQ ID NOs : 244 and 256 are 
specific for genotype Il/lb; SEQ ID NOs: 245 and 257 are 
specific for genotype i/la; SEQ ID NOs: 246 and 258 are 
specific for genotype 4a; SEQ ID NOs: 247 and 259 are 

specific for genotype 4c ; SEQ ID NOs: 248 and 260 are 

specific for genotype 4d; SEQ ID NOs: 249 and 261 are 

specific for genotype 4b; SEQ ID NOs : 250 and 262 are 

specific for genotype 5a and SEQ ID NOs: 251 and 263 are 
specific for genotype 6a. In SEQ ID NO: 240, Xaa at 
position 22 is a residue of Ala or Thr, Xaa at position 24 
is a residue of Val or lie, Xaa at position 26 is a residue 
of Val or Met; in SEQ ID NO: 242, Xaa at position 5 is a Ser 
or Thr residue, Xaa at position 11 is an Arg or Gin 
residue, Xaa at position 12 is an Arg or Gin residue; in 
SEQ ID NO: 243, Xaa at position 3 is a Pro or Ser residue, 
Xaa at position 33 is a Leu or Met residue; in SEQ ID 
NO: 244, Xaa at position 5 is a Thr or Ala residue, Xaa at 
position 13 is a Gly, Ala, Ser, Val or Thr residue, Xaa at 
position 14 is a Ser, Thr or Asn residue, Xaa at position 
15 is a Val or lie residue, Xaa at position 16 is a Pro or 
Ser residue, Xaa at position 18 is a Thr or Lys residue, 

Xaa at position 19 is a Thr or Ala residue, Xaa at position 
22 is an Arg or His residue, Xaa at position 32 is an Ala, 
Val or Thr residue; in SEQ ID NO: 245, Xaa at position 3 is 
an Ala or Pro residue, Xaa at position 4 is a Val or Met 
residue, Xaa at position 5 is a Thr or Ala residue, Xaa at 
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position 17 is a Thr or Ala residue, Xaa at position 18 is 
a Thr or Ala residue, Xaa at position 23 is a His or Tyr 
residue; in SEQ ID NO: 247, Xaa at position 10 is a Val or 
Ala residue, Xaa at position 11 is a Ser or Pro residue, 

Xaa at position 18 is an Asp or Glu residue Xaa at position 
20 is a Leu or lie residue; in SEQ ID NO: 250, Xaa at 
position 3 is a Gin or His residue, Xaa at position 12 is 
an Asn, Ser or Thr residue, Xaa at position 13 is a Leu or 
Phe residue, Xaa at position 23 is an Ala or Val residue; 
in SEQ ID NO: 252, Xaa at position 16 is a Val or Ala 
residue, Xaa at position 18 is a Glu or Gin residue; in SEQ 
ID NO: 254, Xaa at position 2 is an Ala or Thr residue, Xaa 
at position 4 is a Met or Leu residue, Xaa at position 9 is 
an Ala or Val residue, Xaa at position 17 is an lie or Leu 
residue, Xaa at position 20 is an lie or Val residue, Xaa 
at position 21 is a Ser or Gly residue; in SEQ ID NO: 151, 
Xaa at position 9 is a Val or lie residue, Xaa at position 
16 is a Leu or Val residue, Xaa at position 20 is an lie or 
Leu residue; in SEQ ID NO: 256, Xaa at position 2 is an Ala 
or Thr residue, Xaa at position 6 is a Val or Leu residue, 
Xaa at position 12 is an lie or Leu residue, Xaa at 
position 16 is a Val or lie residue, Xaa at position 17 is 
a Val, Leu or Met residue, Xaa at position 19 is a Met or 
Val residue, Xaa at position 21 is an Ala or Thr residue ; 
in SEQ ID NO: 257, Xaa at position 2 is a Thr or Ala 
residue, Xaa at position 6 is a Val, lie or Met residue, 

Xaa at position 12 is an lie or Val residue, Xaa at 
position 16 is a lie or Val residue; in SEQ ID NO: 155, Xaa 
at position 5 is a Leu or Val residue, Xaa at position 21 
is a Thr or Ala residue; in SEQ ID NO: 262, Xaa at position 
1 is a Thr or Ala residue, Xaa at position 5 is a Val or 
Leu residue, Xaa at position 9 is a Leu, Met or Val 
residue, Xaa at position 23 is a Gly or Ala residue. 

Examples of core amino acid domains from which 
genotype-specific peptides may be deduced, include but are 
not limited to, those shown below where the sequence in 
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which the indicated domains are found is given in 


parentheses to the right 

of 

each genotype : 



Genotype 



Amino 

Acid Domains 

la 

(consensus sequence 

of 

Figure 

7A) 

67-78 

lb 

(consensus sequence 

of 

Figure 

7B) 

67-78 

2 

(consensus sequence 

of 

Figure 

7F) 

66-81 

110-119 

2a 

(consensus sequence 

of 

Figure 

7D) 

67-78 

115-125 

2b 

(consensus sequence 

of 

Figure 

7E ) 

67-78 

123-133 

2c 

(SEQ ID NO : 186 ) 




67-78 

75-81 

184-191 

3a 

(consensus sequence 

of 

Figure 

7G) 

8-22 

32-46 

67-78 

158-170 

180-191 

4 

(consensus sequence 

of 

Figure 

7H) 

14-23 

4a 

(SEQ ID NO : 191) 




67-78 

4b 

(SEQ ID NO : 193 ) 




45-57 

67-78 

4c 

(SEQ ID NO : 195 ) 




67-78 

4d 

(SEQ ID NO : 197 ) 




67-78 

4e 

(SEQ ID NO : 1 94 ) 




67-78 

4 f 

(SEQ ID NO: 192) 




67-78 

5a 

(consensus sequence 

of 

Figure 

7 J) 

67-78 

6a 

(SEQ ID NO: 206) 




67-78 

101-108 

144-155 

157-163 


Those skilled 

in 

the art 

would be aware that the 


peptides of the present invention or analogs thereof can be 
synthesized by automated instruments sold by a variety of 
manufacturers or can be commercially custom-ordered and 
prepared. The term analog has been described earlier in 
the specification and for purposes of describing the 
peptides of the present invention, analogs can further 
include branched, cyclic or other non-linear arrangements 
of the peptide sequences of the present invention. 

Alternatively, peptides can be expressed from 
nucleic acid sequences where such sequences can be DNA, 
cDNA, RNA or any variant thereof which is capable of 


35 
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directing protein synthesis. In one embodiment, 
restriction digest fragments containing a coding sequence 
for a peptide can be inserted into a suitable expression 
vector that functions in prokaryotic or eukaryotic cells. 
Such restriction digest fragments may be obtained from 
clones isolated from prokaryotic or eukaryotic sources 
which encode the peptide sequence. 

Suitable expression vectors and methods of 
isolating clones encoding the peptide sequences of the 
present invention have previously been described. In yet 
another embodiment, an oligonucleotide capable of directing 
host organism synthesis of the given peptide may be 
synthesized and inserted into the expression vector. 

The preferred size of the peptides of the present 
invention is from about 8 to about 100 amino acids in 
length when the peptides are chemically synthesized with a 
more preferred size being about 8 to about 30 amino acids 
and a most preferred size being about 10 to about 20 amino 
acids in length. For recombinantly expressed peptides, the 
size may range from about 20 to about 190 amino acids in 
length with a more preferred size being about 70 amino 
acids . 

The present invention further relates to the use 
of genotype-specific peptides in methods of detecting 
antibodies against a specific genotype of HCV in biological 
samples. In one embodiment, at least one genotype-specific 
peptide deduced from a genotype-specific core or El amino 
acid domain may be used in any of immunoassays described 
herein to detect antibodies specific for a single genotype 
of HCV. In another embodiment, at least one genotype - 
specific peptide deduced from a genotype-specific core 
nucleotide domain and at least one genotype-specific 
peptide deduced from an El amino acid domain may be used in 
an immunoassay to detect antibodies against a single 
genotype of HCV. A preferred immunoassay is ELISA. 

It is understood by those skilled in the art that 
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the diagnostic assays described herein using genotype- 
specific oligonucleotides or genotype-specific peptides can 
be useful in assisting one skilled in the art to choose a 
course of therapy for the HCV- infected individual. 

In an alternative embodiment, a mixture of 
genotype-specific peptides can be used in an immunoassay to 
detect antibodies against multiple genotypes of HCV 
disclosed herein. For example, a mixture of genotype - 
specific peptides deduced from El amino acid sequences may 
comprise at least one peptide selected from SEQ ID NOs:244- 
245 and 256-257; one peptide selected from SEQ ID NOs:240, 
242, 252 and 254; one peptide selected from SEQ ID NOs:246- 
249 and 258-261; one peptide selected from SEQ ID NOs:250 
and 262; one peptide selected from SEQ ID NOs:243 and 255; 
one peptide selected from SEQ ID NOs:242 and 254 and one 
peptide selected from SEQ ID NOs : 244 and 263. In a 
preferred embodiment, the peptides of the present invention 
can be used in an ELISA assay as described previously for 
recombinant El and core proteins. 

In an alternative embodiment, the peptide (s) 
utilized in an immunoassay to detect all the genotypes of 
HCV disclosed herein may be a universal peptide deduced 
from universally conserved amino acid domains of the El or 
core proteins disclosed herein. 

Examples of universally conserved core amino acid 
domains within the consensus sequence shown in Figure 7J 
from which universal peptides may be deduced include, but 
are not limited to amino acid domains 23-35, 53-66, 93-108, 

122-138, 150-156, and 165-181 of the consensus sequence. 
Examples of universally conserved El amino acid domains 
within the HCV El protein are located within the consensus 
sequence for the 51 HCV El proteins shown in Figure 2H of 
the present application. Examples of universally conserved 
domains within the consensus sequence shown in Figure 2H 
include, but are not limited to, amino acid domains 10-20, 
111-120, and 124-137 of the consensus sequence. The 
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universal peptides of the present invention may be used in 
an immunoassay to detect antibodies in patient sera 
specific for any of the genotypes of HCV disclosed herein. 

The peptides of the present invention or analogs 
thereof may be prepared in the form of a kit, alone or in 
combinations with other reagents such as secondary 
antibodies, for use in immunoassay. 

In another embodiment, the genotype-specific and 
universal peptides of the present invention may be used to 
produce antibodies that will react against HCV El or core 
proteins in immunoassays. In one embodiment, a genotype - 
specific El or core peptide can be used alone or in 
combination with other El or core peptides specific to the 
same genotype as immunogens to produce antibodies specific 
to HCV proteins of a single genotype. 

In another embodiment, a mixture of peptides 
specific for different genotypes may be used to produce 
antibodies that will react with HCV proteins of any 
genotype disclosed herein. More preferably, antibodies 
reactive with HCV proteins of any genotype may be produced 
by immunizing an animal with universal peptide (s) of the 
present invention. Examples of immunoassays in which such 
antibodies could be utilized to detect HCV El and core 
proteins in biological samples include, but are not limited 
to, radioimmunoassays and ELISAs. Examples of biological 
samples in which HCV El and core proteins could be detected 
includes, but it is not limited to, serum, saliva and 
liver. 

Of course, those skilled in the art would readily 
understand that the genotype- specif ic and universal 
peptides of the present invention and expression vectors 
containing nucleic acid sequence capable of directing host 
organism synthesis of these peptides could also be used as 
vaccines against hepatitis C. Formulations suitable for 
administering the peptide (s) and expression vectors of the 
present invention as immunogen, routes of administration, 
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pharmaceutical compositions comprising the peptides 
expression vectors and so forth are the same as those 
previously described for recombinant El and core proteins. 

The genotype-specific and universal peptides of 
the present invention and expression vectors containing 
nucleic acid sequence capable of direct host organism 
synthesis of these peptides may also be supplied in the 
form of a kit, alone, or in the form of a pharmaceutical 
composition as described above for recombinant El and core 
proteins . 

Any articles or patents referenced herein are 
incorporated by reference. The following examples 
illustrate various aspects of the invention but are in no 
way intended to limit the scope thereof. 

15 


20 


25 


30 


35 
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° MATERIALS 

Serum used in these examples was obtained from 
84 anti-HCV positive individuals who were previously found 
to be positive for HCV RNA in a cDNA PCR assay with primer 
set a from the 5' NC region of the HCV genome (Bukh, J. et 
5 al. (1992 (b) ) Proc. Natl. Acad. Sci . USA 89:4942-4946). 

These samples were from 12 countries: Denmark (DK) ; 

Dominican Republic (DR) ; Germany (D) ; Hong Kong (HK) ; India 
(IND) ; Sardinia, Italy (S) ; Peru (P) ; South Africa (SA) ; 
Sweden (SW) ; Taiwan (T) ; United States (US) ; and Zaire (Z) . 

10 

Example 1 

Identification of the cDNA Sequence 
of the El Gene of 51 Isolates of HCV via 
RT- PCR Analysis of Viral RNA Using Universal Primers 


15 


20 


25 


30 


35 


Viral RNA was extracted from 100 fil of serum by 
the guanidinium-phenol- chloroform method and the final RNA 
solution was divided into 10 equal aliquots and stored at 
- 8 0 ° C as described (Bukh, et al . (1992 (a)). The sequences 

of the synthetic oligonucleotides used in the RT-PCR assay, 
deduced from the sequence of HCV strain H-77 (Ogata, N. et 
al . (1991) Proc. Natl. Acad. Sci. USA 88:3392-3396), are 

shown as SEQ ID NOs : 207-212. One aliquot of the final RNA 
solution, equivalent to 10 [il of serum, was used for cDNA 
synthesis that was performed in a 20 /xl reaction mixture 
using avian myeloblastosis virus reverse transcriptase 
(Promega, Madison, WI) and SEQ ID NO: 208 as a primer. The 
resulting cDNA was amplified in a "nested" PCR assay by Taq 
DNA polymerase (Amplitaq, Perkin-Elmer/Cetus) as described 
previously (Bukh et al , (1992a)) with primer set e (SEQ ID 

NOs : 207-210) . Precautions were taken to avoid 
contamination with exogenous HCV nucleic acid (Bukh et al . 
1992a) ) , and negative controls (normal, uninfected serum) 
were interspersed between every test sample in both the RNA 
extraction and cDNA PCR procedures. No false positive 
results were observed in the analysis. In most instances, 
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amplified DNA (first or second PCR products) was 
reamplified with primers SEQ ID NO: 211 and SEQ ID NO: 212 
prior to sequencing since these two primers contained EcoRl 
sites which would facilitate future cloning of the El gene. 
Amplified DNA was purified by gel electrophoresis followed 
by glass-milk extraction (Geneclean, BIO 101, LaJolla, CA) 
and both strands were sequenced directly by the dideoxy- 
nucleotide chain termination method (Bachman, B. et al . 
(1990) Nucl. Acids Res. 18:1309)) with phage T7 DNA 
polymerase (Sequenase, United States Biochemicals, 
Cleveland, OH), [alpha 35 S] dATP (Amersham, Arlington 
Heights, IL) or [alpha 33 P] dATP (Amersham or DuPont, 
Wilmington, DE) and sequencing primers. RNA extracted from 
serum containing HCV strain H-77, previously sequenced by 
Ogata, N. et al . (1991), was amplified with primer set e 

(SEQ ID NOs: 207-210) and sequenced in parallel as a 
control. The nucleotide sequences of the envelope 1 (El) 
gene of all 51 HCV isolates are shown as SEQ ID NOs : 1 - 51. 
In all 51 HCV isolates, the El gene was exactly 576 
nucleotides in length and did not have any in- frame stop 
codons . 

Example 2 

Computer Analysis of the Nucleotide 
and Deduced Amino Acid Sequences 
of the El Gene of 51 HCV Isolates 

Multiple computer-generated alignments of the 
nucleotide (SEQ ID NOs: 1-51, Figures 1A-H) and deduced 
amino acid sequences (SEQ ID NOs: 52-102, Figures 2A-H) of 
the cDNAs of the 51 HCV isolates constructed using the 
computer program GENALIGN (Miller, R.H. et al . (1990) Proc . 
Natl. Acad. Sci . USA 87:2057-2061) resulted in the 51 HCV 
isolates being divided into twelve genotypes based upon the 
degree of variation of the El gene sequence as shown in 
table 1 . 



- 64 - 



35 


The grouping of SEQ ID NOs: into genotypes is previously described in the specification. 
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The nucleotide and amino acid sequence identity 
of HCV isolates of the same genotype was in the range of 
88.0-99.1% and 89.1-98.4%, respectively, whereas that of 
HCV isolates of different genotypes was in the range of 
53.5-78.6% and 49.0-82.8%, respectively. The latter 
differences are similar to those found when comparing the 
envelope gene sequences of the various serotypes of the 
related f laviviruses , as well as other RNA viruses. When 
microheterogeneity in a sequence was observed, defined as 
more than one prominent nucleotide at a specific position, 
the nucleotide that was identical to that of the HCV 
prototype (HCV1, Choo et al . (1989)) was reported if 

possible. Alternatively, the nucleotide that was identical 
to the most closely related isolate is shown. 

Analysis of the consensus sequence of the El 
protein of the 51 HCV isolates from this study demonstrated 
that a total of 60 (30.3%) of the 192 amino acids of the El 
protein were invariant among these isolates (Fig. 3) . Most 
impressive, all 8 cysteine residues as well as 6 of 8 
proline residues were invariant. The most abundant amino 
acids (e.g. alanine, valine and leucine) showed a very low 
degree of conservation. The consensus sequence of the El 
protein contained 5 potential N-linked glycosylation sites. 
Three sites at positions 209, 305 and 325 were maintained 
in all 51 HCV isolates. A site at position 196 was 
maintained in all isolates except the sole isolate of 
genotype 2c. Also, a site at position 234 was maintained 
in all isolates except one isolate of genotype I/la, all 
four isolates of genotype IV/2b and the sole isolate of 
genotype 6a. Conversely, only genotype IV/2b isolates had 
a potential glycosylation site at position 233 . Further 
analysis revealed a highly conserved amino acid domain (aa 
302-328) in the El protein with 20 (74.1%) of 27 amino 

acids invariant among all 51 HCV isolates. It is possible 
that the 5' and 3' ends of this domain are conserved due to 
important cysteine residues and N-linked glycosylation 
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sites. The central sequence, 5 ' -GHRMAWDMM- 3 ' (aa 315-323), 
may be conserved due to additional functional constraints 
on the protein structure. Finally, although the amino acid 
sequence surrounding the putative El protein cleavage site 
was variable, an amino acid doublet (GV) at position 380 
was invariant among all HCV isolates. 

A dendrogram of the genetic relatedness of the El 
protein of selected HCV isolates representing the 12 
genotypes is shown in Fig. 4. This dendrogram was 
constructed using the program CLUSTAL (Higgins, D.G. et al . 
(1988) Gene, 73:237-244) and had a limit of 25 sequences. 
The scale showing percent identity was added based upon 
manual calculation. From the 51 HCV isolates for which the 
complete sequence of the El gene region was obtained, 25 
isolates representing the twelve genotypes were selected 
for analysis. This dendrogram in combination with the 
analysis of the El gene sequence of 51 HCV isolates in 
Table 1 demonstrates extensive heterogeneity of this 
important gene . 

The worldwide distribution of the 12 genotypes 
among 74 HCV isolates is depicted in Fig. 5. The complete 
El gene sequence was determined in 51 of these HCV isolates 
(SEQ ID NOs : 1 - 51 ) , including 8 isolates of genotype i/la, 

17 isolates of genotype Il/lb and 26 isolates comprising 
genotypes IIl/2a, IV/2b, 2c, 3a, 4a-4d, 5a and 6a. In the 
remaining 23 isolates, all of genotypes I/la and Il/lb, the 
genotype assignment was based on a partial El gene sequence 
since they did not represent additional genotypes in any of 
the 12 countries. The number of isolates of a particular 
genotype is given in each of the 12 countries studied. Of 
the twelve genotypes, genotypes I/la and Il/lb were the 
most common accounting for 48 (65%) of the 74 isolates. 
Analysis of the El gene sequences available in the GenBank 
data base at the time of this study revealed that all 44 
such sequences were of genotypes I/la, Il/lb, III/2a and 
IV/2b. Thus, based upon El gene analysis, 8 new genotypes 
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° of HCV have been identified. 

Also of interest, different HCV genotypes were 
frequently found in the same country, with the highest 
number of genotypes (five) being detected in Denmark. Of 
the twelve genotypes, genotypes I/la, Il/lb, III/2a, IV/2b 
5 and V/3a were widely distributed with genotype Il/lb being 
identified in 11 of 12 countries studied (Zaire was the 
only exception) . In addition, while genotypes I/la and 
Il/lb were predominant in the Americas, Europe and Asia, 
several new genotypes were predominant in Africa. 

10 It was also found that genotypes I/la, Il/lb, 

III/2a, IV/2b and V/3a of HCV were widely distributed 
around the world, whereas genotypes 2c, 4a, 4b, 4d, 5a and 
6a were identified only in discreet geographical regions. 
For example, the majority of isolates in South Africa 
15 comprised a new genotype (5a) and all isolates in Zaire 
comprised 3 new closely related genotypes (4a, 4b, 4c) . 
These genotypes were not identified outside Africa. 


Identification of the cDNA Sequence 
Of The Core Gene Of 52 Isolates Of HCV 


Viral RNA extraction, cDNA synthesis and "nested" 
PCR were carried out as in Example 1 . For the cDNA PCR 
assay HCV-specific synthetic oligonucleotides deduced from 
previously determined sequences that flank the C gene were 
used. Amplified DNA was purified by gel electrophoresis 
followed by glass -milk extraction as described in Example 1 
or by electroelution and both strands were sequenced 
directly. In 44 of the 52 HCV isolates studied the 
procedures for direct sequencing described in Example 1 
were utilized. For a number of the HCV isolates 
confirmatory sequencing was performed with the Applied 
Biosystems 373A automated DNA sequencer and 8 HCV isolates 
of genotype I/la or Il/lb were sequenced exclusively by 
this method. All 73 negative control samples interspersed 
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among the test samples were negative for HCV RNA. 

The amplified DNA fragment obtained in 50 of the 
52 HCV isolates was specifically designed to overlap with 
previously obtained 5 ' NC sequences (Bukh et al . (1992b) 

Proc. Natl. Acad. Sci . U.S.A. 89:4942-4946) and with the El 
sequences disclosed herein at approximately 80 nucleotide 
positions each. A complete match was observed in 6033 of 
6035 overlapping nucleotides. Two discrepancies were 
observed in isolate US6 at nt 552 (C and T) and nt 561 (C 
and T) respectively. This may have been due to 
microheterogeneity at these nucleotide positions, since the 
remaining overlapping sequence was unique for isolate US6 . 
In addition, there were 3 confirmed instances of 
microheterogeneity: nt 33 in isolate SA11 (C,T and T) , nt 
36 in isolate S45 (A, C and A), and nt 552 in isolate P10 

( C , T and T) . Overall, the excellent agreement in these 
overlapping sequences in this study with the NC sequences 
disclosed in Bukh et al . and with the El sequences 
disclosed herein definitively ruled out contamination as a 
source of non- authentic HCV sequences. Furthermore, this 
analysis proved that the sequences obtained were from a 
single population, and not from different populations as 
could happen in mixed infections. 

The core (C) gene was exactly 573 nucleotides in 
length in all 52 HCV isolates with an amino terminal start 
codon and no in- frame stop codons. Microheterogeneity was 
observed in 26 of the 52 HCV isolates at 0.2 -1.4% of the 
573 nucleotide positions of the C gene, and resulted in 
changes in 0. 5-1.0% of the 191 predicted amino acids in 12 
of these isolates. A multiple sequence alignment was 
performed and it showed that the nucleotide identities of 
the C gene among these HCV isolates were in the range of 
79.4-99.0%. In order to compare the genetic relatedness of 
HCV isolates in different gene regions, phylogenetic trees 
of the C gene of all 52 HCV isolates and the El gene of 51 
HCV isolates were constructed using the unweighted pair- 
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group method with arithmetic mean (Nei, M. (1987) Molecular 
Evolutionary Genetics (Columbia University Press, New York, 
N.Y., pp. 287-326) (Figure 8) . In both dendrograms a 
division of the 45 HCV isolates from which C and El genes 
had been cloned into at least six major genetic groups 
(genotypes 1-6) and 12 minor genetic groups (genotypes 
I/la, Il/lb, IIl/2a, IV/2b, 2c, V/3a, 4a-4d, 5a, and 6a) 
was observed. It is noteworthy that a major division in 
genetic distance between HCV isolates of genotype 2 and 
those of the other genotypes in the phylogenetic analyses 
of both gene sequences was observed. Furthermore, the 
divergence of the minor genotypes within genotype 2 
exhibited a degree of heterogeneity that is equivalent to 
that observed among the major genotypes. Analysis of the C 
gene from isolates Z5 and Z8, which had a unique 5' NC 
sequence (Bukh et al . (1992)) but from which the El gene 

could not be amplified, revealed that these isolates 
represented two additional genotypes. The designations 4e 
and 4f are assigned to these genotypes that have not been 
described previously. Overall, the present specification 
demonstrates that the genetic relatedness of HCV isolates 
is equivalent when analyzing the most conserved gene (C) 
and one of the most variable genes (El) of the HCV genome, 
thereby providing strong evidence for the suggested 
division into major and minor genotypes. 

Example 4 

Computer Analysis of the Nucleotide and Deduced 
Amino Acid Sequences Of The Core Gene Of 52 HCV Isolates 

In order to study further the heterogeneity of 
the C gene, a consensus sequence of the core gene from the 
52 HCV isolates (Fig. 6J) was obtained. A total of 335 
(58.5%) of the 573 nucleotides of the C gene were invariant 
among these HCV isolates. Nucleotides at the 1st and 2nd 
codon positions were invariant at 70.7% and 81.7% of these 
positions, respectively, while nucleotides at the 3rd 
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position were invariant at only 23.0% of such positions. 
Stretches of 6 or more invariant nucleotides were observed 
from nucleotides 1-8, 22-27, 85-92, 110-125, 131-141, 334- 

340, 364-371, 397-404, and 511-516 and may be suitable for 
anchoring primers for amplification of HCV RNA in cDNA PCR 
assays . 

Genotype-specific nucleotide positions of the 
core gene of hepatitis C virus were also noted for each of 
the genotypes. These genotype -specific nucleotides are 
shown below where each genotype-specific nucleotide is 
given in parentheses next to the nucleotide position in 
which it is found. 

Genotype 1: 460 (C) , 466 (C) , 483 (C) , 486 (G) . 

Genotype I /la: 180 (T) . 

Genotype I I /lb: 106 (C) , 273 (G) . 

Genotype 2 : 192 (C) , 201 (A), 203 (A), 207 (G) , 210 (C) , 

221 (A), 231 (A), 232 (A), 341(A). 

Genotype III /2a: 315 (C) , 355 (G) . 

Genotype IV/2b: 45 (A) , 174 (G) , 216 (C) , 348 (A) , 376 (A) , 

414 (T) . 

Genotype 2c : 233 (G) , 312 (C) , 318 (A) , 456 (C) , 462 (G) , 

543 (C) , 556 (T) . 

Genotype V/3a: 47 (T) , 84 (A) , 106 (G) , 126 (A) , 150 (T) , 
212 (G) , 216 (A) , 300 (A) , 491 (T) , 559 (C) , 560 (A) , 568 
(G) , 571 (A) , 572 (G) 

Genotype 4 : 59 (T) . 

Genotype 4a: 213 (A) , 231 (G) , 415 (A) . 
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Genotvoe 

4b: 

66 

(G) , 

145 

(G) , 310 

(A) . 



Genotvoe 

4c : 

213 

(T) , 

219 

(A) , 270 

(T) . 



Genotvoe 

4d: 

212 

(T) , 

327 

(G) , 469 

(C) . 



Genotype 

4e : 

199 

(C) , 

306 

(A) , 326 

(A) . 



Genotvoe 

4 f : 

57 

(T) , 

75 (A) , 267 (A) . 



Genotvoe 

5a: 

291 

(G) , 

294 

(C) . 




Genotvoe 

6a: 

59 

(C) , 

175 

(A) , 195 

(A) , 198 

(A) , 214 

(C) 

224 (A) , 

316 

(C) 

, 351 

(G) 

, 387 (G) 

, 444-447 

(GGCT) , 

450 

(G) , 471- 

-472 

(AA) , 474 (C) . 





These genotype- specif ic nucleotides are of 
utility in designing the genotype-specific PCR primers and 
hybridization probes. 

Finally, although the full length nucleic acid 
sequence of the C gene of isolates representing genotypes 
I/la, II/lb, III/2a, IV/2b and V/3a have been reported by 
others, those of 9 of the 14 genotypes (i.e., 2c, 4a-4f, 5a 

and 6a) have not been reported previously. In sum, by 
aligning the consensus sequences of the major genotypes, 
the present application enables those skilled in the art to 
map universally conserved sequences as well as genotype- 
specific sequences of the C gene among 14 genotypes of HCV. 

In order to study the heterogeneity of the 
deduced C protein, a multiple sequence alignment of the 
predicted amino acids for all 52 HCV isolates was 
performed, and a consensus sequence was obtained (Fig. 7J) . 
The identities of the predicted 191 amino acids of the C 
protein among these HCV isolates were in the range of 85.3- 
100.0%. A total of 132 (69.1%) of the 191 amino acids of 

the C protein were invariant. The most prevalent amino 
acids in the consensus sequence were glycine (13.6%), 
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arginine (12.6%), proline (11.0%), and leucine (9.9%). The 
most conserved amino acids were tryptophan (5 of 5 amino 
acids invariant) , aspartic acid (5 of 5 amino acids 
invariant) , proline (19 of 21 amino acids invariant) and 
glycine (23 of 26 amino acids invariant) . Previous 
analyses indicated that HCV is evolutionarily related to 
pestiviruses (Miller et al . (1990) Proc . Natl. Acad. Sci . 

U. S . A. 87:2057-2061) . In this regard, it is of interest to 
note that the C proteins of both viruses have a high 
content of proline residues (Collette M.S. et al . (1988) 

Virology 165:200-208), which are likely to be important in 
maintaining the structure of this protein. As is 
characteristic for a protein that binds to nucleic acid, 
the C protein has conserved amino acids that are basic and 
positively charged, and these are capable of neutralizing 
the negative charge of the HCV RNA encapsidated by this 
protein (Rice, C.M. et al . (1986) in Togaviridae and 

Flaviviridae, eds Schleinger, S. & Schlensinger , M.J. 

(Plenum Press, New York, N.Y.) pp . 279-326) . Specifically, 
over 16% of the amino acids in the consensus sequence of 
the C protein of HCV are arginine and lysine that are 
located primarily in three clusters (i.e., from amino 
acids 6-23, 39-74 and 101-121) (Shih, C.M. et al . (1993) J. 

Gen. Virol. 67:5823-5832) (Fig. 7J) . The 10 arginine and 
lysine residues within amino acids 39-62 are invariant 
among all 52 HCV isolates, suggesting that this domain may 
represent an important RNA-binding site. The capsid 
proteins of the related flavi-and pestiviruses (Miller et 
al . (1990)) also have a high content of arginine and lysine 

(Rice et al . (1986); Collette et al . (1988) . Although 

there are three major hydrophilic regions (i.e., amino 
acids 2-23, 39-74 and 101-121) that are conserved in all 52 
HCV isolates, the remainder of the C protein is 
hydrophobic. Interestingly, one such highly conserved 
hydrophobic domain from aa 24-39 is flanked by proline 
residues. The hydrophobic domains are likely to be 



involved in protein-protein and/or protein-RNA interactions 
during assembly of the nucleocapsid, as well as in 
interaction with the lipoprotein envelope, as has been 
suggested for flaviviruses (Rice et al . (1986)) . Other 

significant observations are: (i) a cluster of 5 invariant 

tryptophan residues from aa 76-107; (ii) the lack of an re- 
linked glycosylation site (N-X-T/S) ; (iii) two potential 
nuclear localization signals (i.e., PRRGPR at amino acids 
38-43 and PRGRRQP at amino acids 58-64) that are present in 
all 52 HCV isolates (Shih et al . (1993)); and (ivj a 

putative DNA-binding motif SPRG at amino acids 99-102, 
found in 51 of the 52 HCV isolates, with SP present in all 
52 isolates. This study demonstrates that the C protein 
has features that are highly conserved among the various 
genotypes of HCV, and that are known to be characteristic 
of capsid proteins of other related viruses. 

It should also be noted that the phylogenetic 
analysis of the amino acid sequence of the C proteins was 
not capable of resolving the minor groups within genotypes 
1 and 4 because of the conservation of this protein (data 
not shown) . Indeed, only a few type-specific amino acids 
were identified. One striking example was that isolates of 
genotype 4 have an additional methionine at position 20 
that is specific for this major genetic group. Finally, 
the conservation of the sequences surrounding the cleavage 
site between the C and the El proteins of the different 
genotypes, which has been determined to be between amino 
acid 191 (alanine) and aa 192 (tyrosine) in HCV isolates of 
genotype 1 was analyzed (Hijikata, M. , et al . (1991) Proc . 

Natl. Acad. Sci . USA 88:5547-5551). The C-terminal 
sequence of C is serine-alanine in all but one of the 48 
HCV isolates comprising genotypes 1, 2, 4, 5 and 6. 

However, all 4 HCV isolates of genotype 3 in this study, as 
well as isolates of genotype 3 published previously 
(Okamoto, H. , et al. (1993) J. Gen. Virol. 74:2385-2390, 
Stuyver, L. , et al . (1993) Biochem. Biophys . Res. Comm. 



192:635-641), contain alanine-serine at this position. 

Thus, studies will be needed to determine the C/El cleavage 
site in genotype 3 isolates. Overall, the present 
invention application discloses the mapping of universally 
conserved sequences, as well as genotype -specific 
sequences, of the C protein among 14 genotypes of HCV. 

Implications of the mapping of universally 
conserved and genotype -specific core nucleotide 
and amino acid core sequences for diagnosis of 
HCV infection and for determination of HCV 
genotypes 

Detection of antibodies directed against the HCV 
core protein is important in the diagnosis of HCV 
infection. The recombinant C22-3 protein, spanning amino 
acids 2-120 of the C gene, is a major component of the 
commercially available second-generation anti-HCV tests. 
Several studies have indicated that the three major 
hydrophilic regions of the C protein contain linear 
immunogenic epitopes (summarized in J. Clin. Microbiol . 
30:1989-1994) (Sallberg, M. et al . (1992) . For example, 

antibodies against synthetic peptides from amino acids 1- 
18, 51-68 and 101-118 were detected in infected patients 
(Sallberg, M. et al . (1992)) . The present application 

demonstrates that, while these immunogenic regions are 
highly conserved, genotype-specific differences are 
observed at several amino acid positions that may influence 
the specificity and sensitivity of the serological tests. 
One such example is that a single amino acid substitution 
at amino acid 110 has been demonstrated to affect sero- 
reactivity (Sallberg, et al . (1992)) . Despite the high 

degree of conservation in the immunodominant regions of the 
C protein among the different genotypes, it is possible 
that genetic heterogeneity of the C protein could lead to 
false negative results in current serological tests. 

With respect to genotype analysis, several 
methods have been used to determine the genotype of HCV 
isolates without resorting to sequence analysis. These 



include PCR followed by: (i) amplification with type- 

specific primers (Okamoto, H. et al . (1992) J. Gen. Virol. . 

73:673-679); (ii) determination of restriction-length 
polymorphism (Simmons, P. et al . (1993) J. Gen. Virol. . 

74:661-668); and (iii) specific hybridization (Stuyver, L. 
(1993) J . Gen. Virol . , 74:1093-1102) . The proposed methods 

have primarily been based on 5 ' NC and C sequences. 

Previous studies suggested that 5' NC-based genotyping 
systems would only be predictive of the major genetic 
groups of HCV (Bukh, J. , et al . (1992) Proc . Natl. Acad. 

Sci. USA 89:4942- 4946, Bukh, J., et al . (1993) Proc. 

Natl. Acad. Sci. USA 90:8234-8238) . The most widely used C- 
based genotype system has been the PCR assay with type- 
specific primers that was designed for distinguishing HCV 
isolates of genotypes I/la, Il/lb, III/2a, IV/2b and V/3a 
(Okamoto, H. , et al . (1993) J. Gen. Virol. 74:2385-2390, 

Okamoto, H. et al . (1992) J. Gen. Virol. 73:673-679). 

Since this system was developed prior to the identification 
of genotypes 2c, 4a-4f, 5a and 6a there are significant 
limitations to this typing system. For example, the 
primers specific for genotype IV/2b (nt 270-251) are as 
highly conserved within isolates of genotype 4c and 6a as 
within the isolates of genotype IV/2b. Thus, this assay 
probably can not distinguish among these genotypes. Another 
C-based approach involves distinguishing between genotypes 
1 and 2 by type-specific antibody responses (Machida et al 
(1992) Hepotolocry , 16:886-891) . Synthetic peptides 
composed of amino acids 65-81 were found to be genotype- 
specific for genotypes 1 and 2 in ELISA assays. The 
present analysis of amino acid sequences demonstrated 
significant variation within isolates of genotypes 1 and 2. 
Thus it is likely that these peptides will not identify all 
isolates of genotypes 1 and 2. Furthermore, the peptide 
for genotype 1 was highly conserved within isolates of 
genotypes 3 and 4 and might detect antibodies against these 
genotypes as well. Finally, it should be pointed out that 



most isolates of genotypes 3 and 4 had an identical amino 
acid sequence at positions 65-81. 

Example 5 

Detection by ELISA Based on Antigen from 
Insect Cells Expressing Complete El Or Core Protein 

Expression of El or Core protein in SF9 cells . A 
cDNA (eg SEQ ID N0:1) encoding a complete El protein ( eg 
SEQ ID NO: 52) or a cDNA (eg SEQ ID NO: 103) encoding a 
complete core protein ( e.g. SEQ ID NO: 155) is subcloned 
into pBlueBac - Transfer vector (Invitrogen) using standard 
subcloning procedures. The resultant recombinant 
expression vector is cotransfected into SF9 insect cells 
(Invitrogen) by the Ca precipitation method according to 
the Invitrogen protocol . 

ELISA Based on Infected SF9 cells . 5 x 10 6 SF9 
cells infected with the above -described recombinant 
expression vector are resuspended in 1 ml of 10 mM Tris- 
HC1 , pH 7.5, 0.15M NaCl and are then frozen and thawed 3 
times. 10 ul of this suspension is dissolved in 10 ml of 
carbonate buffer (pH 9.6) and used to cover one flexible 
microtiter assay plate (Falcon) . Serum samples are diluted 
1:20, 1:400 and 1:8000, or 1:100, 1:1000 and 1:10000. 
Blocking and washing solutions for use in the ELISA assay 
are PBS containing 10% fetal calf serum and 0.5% gelatin 
(blocking solution) and PBS with 0.05% Tween -20 (Sigma, 

St. Louis, MO) (washing solution) . As a secondary antibody, 
peroxidase-conjugated goat IgG fraction to human IgG or 
horse radish peroxidase-labelled goat anti-Old or anti-New 
World monkey immunoglobulin is used. The results are 
determined by measuring the optical density (O.D.) at 405 
nm. 

To determine if insect cells-derived El or core 
protein representing genotype I/a of HCV could detect anti- 
HCV antibody in chimpanzees infected with genotype I/la of 
HCV, three infected chimpanzees are examined. The serum of 



all 3 chimpanzees are found to seroconvert to anti-HCV. 


Example 6 

Use of the Complete 
El Protein as a Vaccine 

Mammals are immunized with purified or partially- 
purified El protein in an amount sufficient to stimulate 
the production of protective antibodies. The immunized 
mammals challenged with various genotypes of HCV are 
protected. 

It is understood by one skilled in the art that 
the recombinant El protein used in the above vaccine can 
also be used in combination with other recombinant El 
proteins having an amino acid sequence shown in SEQ ID 
NOs: 52-102. In addition, recombinant core proteins having 
an amino acid sequence shown in SEQ ID NOs: 155-206 could 
also be used in the above vaccine, either alone, in 
combination with other recombinant core proteins of the 
present invention, or in combination with recombinant El 
proteins having an amino acid sequence shown in SEQ ID 
NOs : 52 - 102 . 


Example 7 

Determination of the Genotype of an HCV 
Isolate Via Hybridization of Genotype -Specific 
Oligonucleotides to RT-PCR Amplification Products. 

Viral RNA is isolated from serum obtained from a 
mammal and is subjected to RT-PCR as in Example 1 or 
Example 3. Following amplification, the amplified DNA is 
purified as described in Example 1 or Example 3 and 
aliquots of 100 ul of amplification product are applied to 
dots on a nitrocellulose filter set in a dot blot 
apparatus. The dots are then cut into separate dots and 
each dot is hybridized to a 32 P-labelled oligonucleotide 
specific for a single genotype of HCV. The 
oligonucleotides to be used as hybridization probes are 



deduced from the consensus sequences shown in Figures 1A-1H 
or 6A-6J or from the SEQ ID NOs: representing El or core 
sequences comprising genotypes 4a-4f, 2c and 6a. 

Example 8 

ELISA Based on Synthetic 
Peptides Derived From El cDNA Sequences 

El peptide (s) specific for genotype I/la is 
placed in 0.1% PBS buffer and 50ul of a lmg/ml solution of 
peptide is used to cover each well of the microtiter assay 
plate. Serum samples from two mammals infected with 
genotype I/la HCV and from one mammal infected with 
genotype 5a HCV are diluted as in Example 3 and the ELISA 
is carried out as in Example 3 . Both mammals infected with 
genotype I HCV react positively with peptides while the 
mammal infected with genotype 5a HCV exhibits no 
reactivity. One skilled in the art would readily 
understand that in the above experiment, core peptides 
specific for genotype I/la could be used in place of, or in 
combination with the El genotype -specific peptide (s). 

Example 9 

Use of El Peptides as a Vaccine 

Since the El genotype-specific peptides of the 
present invention are derived from two variable regions in 
the complete El protein, there exists support for the use 
of these peptides as a vaccine to protect against a variety 
of HCV genotypes. Mammals are immunized with peptide (s) 
selected from SEQ ID NOs : 136-159 in an amount sufficient 
to stimulate production of protective antibodies. The 
immunized mammals challenged with various genotypes of HCV 
are protected. One skilled in the art would readily 
understand that genotype-specific core peptides of the 
present invention could also be used either alone, in 
combination with each other, or in combination with the 
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genotype -specific El peptides, as a vaccine to protect 
against a variety of HCV genotypes. In addition, the above 
vaccines may also be formulated using the universal core 
and/or El peptides of the present invention. 
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SEQUENCE LISTING 


GENERAL 

(i) 

(ii) 

(iii) 

(iv) 

(v) 

(vi) 

(vii) 

( viii ) 

(ix) 

(x) 


INFORMATION: 

APPLICANTS: BUKH, J., MILLER, R.H. AND 

PURCELL, R.H. 

TITLE OF INVENTION: NUCLEOTIDE AND DEDUCED 

AMINO ACID SEQUENCES OF THE ENVELOPE 1 AND 
CORE GENES OF ISOLATES OF HEPATITIS C VIRUS 
AND THE USE OF REAGENTS DERIVED FROM THESE 
SEQUENCES IN DIAGNOSTIC METHODS AND 
VACCINES 

NUMBER OF SEQUENCES: 263 

CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: MORGAN & FINNEGAN, L.L.P. 

(B) STREET: 345 PARK AVENUE 

(C) CITY: NEW YORK 

(D) STATE: NEW YORK 

(E) COUNTRY: USA 

(F) ZIP: 10154 

COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: FLOPPY DISK 

(B) COMPUTER: IBM PC COMPATIBLE 

(C) OPERATING SYSTEM : PC -DOS /MS -DOS 

(D) SOFTWARE: WORDPERFECT 5.1 

CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: TO BE ASSIGNED 

(B) FILING DATE: 26-MAY-1998 

PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/290,665 

(B) FILING DATE: 15-AUG-1994 

CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/086,428 

(B) FILING DATE: 29- JUNE-1993 

ATTORNEY/AGENT INFORMATION : 

(A) NAME: RICHARD W. BORK 

(B) REGISTRATION NUMBER: 36,459 

(C) REFERENCE/DOCKET NUMBER: 2026-4116US2 

TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (212) 758-4800 

(B) TELEFAX: (212) 751-6849 
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(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 


10 TAC CAA GTG CGC AAC TCC ACG GGG CTT TAC CAT GTC ACC 39 

AAT GAT TGC CCT AAC TCG AGT ATC GTG TAC GAG GCG GCC 78 

GAT GCC ATC CTG CAC ACT CCG GGG TGT GTC CCT TGC GTT 117 

CGC GAG GGT AAC GTC TCG AGG TGT TGG GTG GCG ATG ACC 156 

CCC ACG GTG GCC ACC AGG GAT GGC AAA CTC CCC ACA GCG 195 

CAG CTT CGA CGT CAC ATC GAT CTG CTC GTC GGG AGT GCC 234 

ACC CTC TGT TCG GCC CTC TAC GTG GGG GAC CTG TGC GGG 273 

TCT GTC TTT CTT GTC GGT CAA CTG TTT ACC TTC TCT CCC 312 

15 AGG CGC CAC TGG ACG ACG CAA GGC TGC AAT TGT TCT ATC 351 

TAT CCT GGC CAT ATA ACG GGT CAC CGC ATG GCG TGG GAT 390 

ATG ATG ATG AAC TGG TCC CCT ACC ACG GCG TTG GTA GTA 429 

GCT CAG CTG CTC CGG ATC CCG CAA GCC ATC TTG GAC ATG 468 

ATC GCT GGT GCT CAC TGG GGA GTC CTG GCG GGC ATA GCG 507 

TAT TTT TCC ATG GTG GGG AAC TGG GCG AAG GTC CTG GTA 546 

GTG CTG CTG CTA TTT GCC GGC GTC GAC GCG 576 


(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

{vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK9 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 


TAC CAA. GTA CGC AAC TCC TCG GGC CTC TAC CAT GTC ACC 39 

AAT GAT TGC CCT AAC TCG AGT ATT GTG TAC GAG GCG GCC 78 

GAT GCC ATC CTG CAT TCT CCA GGG TGT GTC CCT TGC GTT 117 

CGC GAG GGT AAC GCC TCG AAA TGT TGG GTG GCG GTG GCC 156 

CCC ACG GTG GCC ACC AGG GAC GGC AAG CTC CCC GCA ACG 195 

CAG CTT CGA CGT CAC ATC GAT CTG CTT GTC GGG AGC GCC 234 

ACC CTC TGC TCG GCC CTC TAT GTG GGG GAC TTG TGC GGG 273 

35 
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5 


TCT 

GTC 

TTC 

CTT 

GTC 

GGC 

CAA 

CTG 

TTC 

ACC 

TTC 

TCC 

CCC 

312 

AGA 

CGC 

CAC 

TGG 

ACA 

ACG 

CAA 

GAC 

TGC 

AAC 

TGT 

TCT 

ATC 

351 

TAC 

CCC 

GGC 

CAT 

ATT 

ACG 

GGT 

CAT 

CGC 

ATG 

GCG 

TGG 

GAT 

390 

ATG 

ATG 

ATG 

AAC 

TGG 

TCC 

CCT 

ACA 

GCA 

GCG 

CTG 

GTA 

ATG 

429 

GCG 

CAG 

CTG 

CTC 

AGG 

ATC 

CCG 

CAG 

GCC 

ATC 

TTG 

GAC 

ATG 

468 

ATC 

GOT 

GGT 

GCC 

CAC 

TGG 

GGA 

GTC 

CTA 

GCG 

GGC 

ATA 

GCG 

507 

TAT 

TTC 

TCC 

ATG 

GTG 

GGG 

AAC 

TGG 

GCG 

AAG 

GTC 

GTG 

GTG 

546 

GTA 

CTG 

TTG 

CTG 

TTT 

ACC 

GGC 

GTC 

GAT 

GCG 
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(2) INFORMATION FOR SEQ ID NO : 3 : 


10 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 
(C) INDIVIDUAL ISOLATE: DR1 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 


CAC 

CAA 

GTG 

CGC 

AAC 

TCT 

ACA 

GGG 

CTT 

TAC 

CAT 

GTC 

ACC 

39 

AAT 

GAT 

TGC 

CCT 

AAT 

TCG 

AGT 

ATT 

GTG 

TAC 

GAG 

GCG 

GCC 

78 

GAT 

GCC 

ATC 

CTG 

CAC 

GCG 

CCG 

GGG 

TGT 

GTC 

CCT 

TGC 

GTT 

117 

CGC 

GAG 

GGT 

AAC 

GCC 

TCG 

AGG 

TGT 

TGG 

GTG 

GCG 

GTG 

ACC 

156 

CCC 

ACG 

GTG 

GCC 

ACC 

AGG 

GAC 

GGC 

AAA 

CTC 

CCC 

ACA 

ACG 

195 

CAG 

CTT 

CGA 

CGT 

CAC 

ATC 

GAC 

CTG 

CTT 

GTC 

GGG 

AGC 

GCC 

234 

ACC 

CTC 

TGC 

TCG 

GCC 

CTC 

TAC 

GTG 

GGG 

GAC 

CTG 

TGC 

GGG 

273 

TCT 

GTC 

TTC 

CTT 

GTC 

GGT 

CAA 

CTG 

TTC 

ACC 

TTT 

TCT 

CCC 

312 

AGG 

CGC 

CAC 

TGG 

ACA 

ACG 

CAA 

GAC 

TGC 

AAT 

TGT 

TCT 

ATC 

351 

TAT 

CCC 

GGC 

CAT 

ATA 

ACG 

GGA 

CAC 

CGT 

ATG 

GCA 

TGG 

GAT 

390 

ATG 

ATG 

ATG 

AAC 

TGG 

TCC 

CCT 

ACG 

ACA 

GCG 

CTG 

GTA 

ATG 

429 

GCT 

CAG 

CTG 

CTC 

CGG 

ATC 

CCA 

CAA 

GCC 

ATC 

TTG 

GAC 

ATG 

468 

ATC 

GCT 

GGA 

GCC 

CAC 

TGG 

GGA 

GTC 

CTA 

GCG 

GGC 

ATA 

GCG 

507 

TAT 

TTC 

TCC 

ATG 

GTG 

GGG 

AAC 

TGG 

GCG 

AAG 

GTC 

GTG 

GTA 

546 

GTG 

CTG 

TTG 

CTG 

TTT 

GCC 

GGC 

GTT 

GAT 

GCG 
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(2) INFORMATION FOR SEQ ID NO : 4 : 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 
(C) INDIVIDUAL ISOLATE: DR4 


372577_ 
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(xi) 

SEQUENCE 

DESCRIPTION: 

: SEQ ID 

NO : 4 : 


CAC 

CAA 

GTG 

CGC 

AAC 

TCT 

ACA 

GGG 

CTT 

TAC 

CAT 

GTC 

ACC 

39 

AAT 

GAT 

TGC 

CCT 

AAT 

TCG 

AGT 

ATT 

GTG 

TAC 

GAG 

GCG 

GCC 

78 

GAT 

GCC 

ATC 

CTG 

CAC 

ACG 

CCG 

GGG 

TGT 

GTC 

CCT 

TGC 

GTT 

117 

CGC 

GAG 

GGT 

AAC 

ACC 

TCG 

AGG 

TGT 

TGG 

GTG 

GCG 

GTG 

ACC 

156 

CCC 

ACG 

GTG 

GCC 

ACC 

AGG 

GAC 

GGC 

AAA 

CTC 

CCC 

ACA 

ACG 

195 

CAG 

CTC 

CGA 

CGT 

CAC 

ATC 

GAC 

CTG 

CTT 

GTC 

GGG 

AGC 

GCC 

234 

ACC 

CTC 

TGC 

TCG 

GCC 

CTC 

TAC 

GTG 

GGG 

GAC 

TTG 

TGC 

GGG 

273 

TCT 

GTC 

TTC 

CTT 

GTC 

GGT 

CAA 

CTG 

TTC 

ACC 

TTC 

TCT 

CCC 

312 

AGG 

CAC 

CAC 

TGG 

ACA 

ACG 

CAA 

GAC 

TGC 

AAT 

TGT 

TCC 

ATC 

351 

TAT 

CCC 

GGC 

CAT 

ATA 

ACG 

GGC 

CAC 

CGC 

ATG 

GCG 

TGG 

GAT 

390 

ATG 

ATG 

ATG 

AAC 

TGG 

TCC 

CCT 

ACG 

ACA 

GCG 

CTG 

GTA 

GTA 

429 

GCT 

CAG 

CTG 

CTC 

CGG 

ATC 

CCA 

CAA 

GCC 

ATC 

TTG 

GAC 

ATG 

468 

ATC 

GCT 

GGT 

GCC 

CAC 

TGG 

GGA 

GTC 

CTA 

GCG 

GGC 

ATA 

GCG 

507 

TAT 

TTC 

TCC 

ATG 

GTG 

GGG 

AAC 

TGG 

GCG 

AAG 

GTC 

CTG 

GTA 

546 

GTG 

CTG 

TTG 

CTG 

TTT 

GCC 

GGC 

GTT 

GAT 

GCG 
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(2) 

INFORMATION FOR SEQ ID 

NO: 5 : 







15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

20 (A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S14 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 


TAC 

CAA 

GTG 

CGC 

AAC 

TCC 

ACG 

GGG 

CTT 

TAC 

CAT 

GTT 

ACC 

39 

AAT 

GAT 

TGC 

CCT 

AAC 

TCG 

AGT 

ATT 

GTG 

TAC 

GAG 

ACA 

GCT 

78 

GAT 

GCT 

ATC 

CTA 

CAC 

GCT 

CCG 

GGA 

TGT 

GTC 

CCT 

TGC 

GTT 

117 

CGT 

GAG 

GGT 

AAC 

ACC 

TCG 

AGG 

TGT 

TGG 

GTG 

GCG 

ATG 

ACC 

156 

CCC 

ACG 

GTG 

GCC 

ACC 

AGG 

GAC 

GGC 

AAA 

CTC 

CCC 

GCA 

ACG 

195 

CAG 

CTT 

CGA 

CGT 

TAC 

ATC 

GAT 

CTG 

CTT 

GTC 

GGG 

AGC 

GCC 

234 

ACC 

CTC 

TGT 

TCG 

GCC 

CTC 

TAC 

GTG 

GGG 

GAC 

TTG 

TGC 

GGG 

273 

TCT 

GTC 

TTT 

CTT 

GTC 

GGT 

CAG 

CTG 

TTT 

ACC 

TTC 

TCT 

CCC 

312 

AGG 

CGC 

CTC 

TGG 

ACG 

ACG 

CAA 

GAC 

TGC 

AAT 

TGT 

TCT 

ATC 

351 

TAT 

CCC 

GGC 

CAT 

ATA 

ACG 

GGT 

CAT 

CGC 

ATG 

GCA 

TGG 

GAT 

390 

ATG 

ATG 

ATG 

AAC 

TGG 

TCC 

CCT 

ACG 

ACG 

GCA 

CTG 

GTA 

GTA 

429 

GCT 

CAG 

CTG 

CTC 

CGG 

ATC 

CCA 

CAA 

GCC 

ATC 

TTG 

GAT 

ATG 

468 

ATC 

GCT 

GGT 

GCT 

CAC 

TGG 

GGA 

GTC 

CTA 

GCG 

GGC 

ATA 

GCG 

507 

TAT 

TTC 

TCC 

ATG 

GTG 

GGA 

AAC 

TGG 

GCG 

AAG 

GTC 

CTA 

GTG 

546 

GTG 

CTG 

CTG 

CTA 

TTC 

GCC 

GGC 

GTT 

GAC 

GCG 
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(2) INFORMATION FOR SEQ ID NO : 6 : 

35 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 


TAC 

CAA 

GTA 

CGC 

AAC 

TCC 

ACG 

GGC 

CTT 

TAC 

CAT 

GTC 

ACC 

39 

AAT 

GAC 

TGC 

CCT 

AAC 

TCG 

AGC 

ATT 

GTG 

TAC 

GAG 

ACG 

GCC 

78 

GAT 

ACC 

ATC 

CTA 

CAC 

TCT 

CCG 

GGG 

TGT 

GTC 

CCT 

TGC 

GTT 

117 

CGC 

GAG 

GGT 

AAC 

GCC 

TCG 

AGA 

TGT 

TGG 

GTG 

CCG 

GTG 

GCC 

156 

CCC 

ACA 

GTT 

GCC 

ACC 

AGG 

GAC 

GGC 

AAA 

CTC 

CCC 

GCA 

ACG 

195 

CAG 

CTT 

CGA 

CGT 

CAC 

ATC 

GAT 

CTG 

CTT 

GTT 

GGG 

AGC 

GCC 

234 

ACC 

CTC 

TGC 

TCG 

GCC 

CTC 

TAT 

GTG 

GGG 

GAC 

CTG 

TGC 

GGG 

273 

TCT 

GTC 

TTT 

CTT 

GTC 

AGC 

CAG 

CTG 

TTC 

ACT 

ATC 

TCC 

CCC 

312 

AGG 

CGC 

CAC 

TGG 

ACA 

ACG 

CAA 

GAC 

TGC 

AAC 

TGT 

TCT 

ATC 

351 

TAC 

CCC 

GGC 

CAT 

ATA 

ACG 

GGT 

CAC 

CGT 

ATG 

GCA 

TGG 

GAT 

390 

ATG 

ATG 

ATG 

AAC 

TGG 

TCC 

CCT 

ACA 

ACG 

GCG 

TTG 

GTA 

ATA 

429 

GCT 

CAG 

CTG 

CTC 

AGG 

GTC 

CCG 

CAA 

GCC 

GTC 

TTG 

GAC 

ATG 

468 

ATC 

GCT 

GGT 

GCC 

CAC 

TGG 

GGA 

GTC 

CTA 

GCG 

GGC 

ATA 

GCG 

507 

TAT 

TTC 

TCC 

ATG 

GCG 

GGG 

AAC 

TGG 

GCG 

AAG 

GTC 

CTG 

CTA 

546 

GTG 

CTG 

TTG 

CTG 

TTT 

GCC 

GGC 

GTC 

GAT 

GCG 




576 


(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

25 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SW1 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 


TAC 

CAA 

GTA 

CGC 

AAC 

TCC 

TCG 

GGC 

CTT 

TAC 

CAT 

GTC 

ACC 

39 

AAT 

GAT 

TGC 

CCT 

AAC 

TCG 

AGT 

ATT 

GTG 

TAC 

GAG 

ACG 

GCC 

78 

GAT 

GCC 

ATT 

CTA 

CAC 

TCT 

CCA 

GGG 

TGT 

GTC 

CCT 

TGC 

GTT 

117 

CGC 

GAG 

GAT 

GGC 

GCC 

CCG 

AAG 

TGT 

TGG 

GTG 

GCG 

GTG 

GCC 

156 

CCC 

ACA 

GTC 

GCC 

ACT 

AGG 

GAC 

GGC 

AAA 

CTC 

CCT 

GCA 

ACG 

195 

CAG 

CTT 

CGA 

CGT 

CAC 

ATC 

GAT 

CTG 

CTT 

GTC 

GGA 

AGC 

GCC 

234 

ACC 

CTC 

TGC 

TCG 

GCC 

CTC 

TAC 

GTG 

GGG 

GAC 

TTG 

TGC 

GGG 

273 

TCT 

GTC 

TTT 

CTC 

GTC 

AGT 

CAA 

CTG 

TTC 

ACG 

TTC 

TCC 

CCC 

312 
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AGG CGC CAC TGG ACA ACG CAA GAC TGT AAC TGT TCT ATC 351 
TAT CCC GGC CAC ATA ACG GGT CAC CGC ATG GCA TGG GAT 390 
ATG ATG ATG AAC TGG TCC CCC ACA ACA GCG CTG GTA GTA 429 
GCT CAG CTG CTC AGG ATC CCG CAA GCC GTC TTG GAC ATG 468 
ATC GCT GGT GCC CAC TGG GGA GTC CTA GCG GGC ATA GCG 507 
TAT TTC TCC ATG GTG GGG AAC TGG GCG AAG GTC CTG ATA 546 
GTG CTG TTG CTG TTT TCC GGC GTC GAT GCG 576 


{2} INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

10 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: US11 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 


TAC CAA GTA CGC AAC TCC ACG GGG CTT TAC CAT GTC ACC 39 

AAT GAT TGC CCT AAC TCG AGT ATT GTG TAC GAG GCG GCC 78 

GAT GCC ATC CTG CAC ACT CCG GGG TGT GTT CCT TGC GTT 117 

CGC GAG GGT AAC GCT TCG AGG TGT TGG GTG GCG ATG ACC 156 

CCC ACG GTG GCC ACC AGG GAC GGC AAA CTC CCC ACA ACG 195 

CAA CTT CGA CGT CAC ATC GAT CTG CTT GTC GGG AGC GCC 234 

20 ACC CTC TGT TCG GCC CTC TAC GTG GGG GAC CTG TGC GGG 273 

TCT GTC TTT CTT GTC GGT CAA CTG TTT ACC TTC TCT CCC 312 

AGA CGC CAC TGG ACG ACG CAG GGC TGC AAT TGT TCT ATC 351 

TAT CCC GGC CAT ATA ACG GGT CAC CGC ATG GCA TGG GAT 390 

ATG ATG ATG AAC TGG TCC CCT ACG GCG GCG TTG GTG GTA 429 

GCT CAG CTG CTC CGG ATC CCA CAA GCC ATC TTG GAC ATG 468 

ATC GCT GGT GCT CAC TGG GGA GTC CTA GCG GGC ATA GCG 507 

TAT TTC TCC ATG GTG GGG AAC TGG GCG AAG GTC CTG GTA 546 

25 GTG CTG CTG CTA TTT GCC GGC GTC GAC GCG 576 


(2) INFORMATION FOR SEQ ID NO : 9 : 

SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: D1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 
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TAT GAA GTG CGC AAC GTG TCC GGG GTG TAC CAT GTC ACG 39 

AAC GAC TGT TCC AAC TCG AGC ATT GTG TAT GAG ACA GCG 78 

GAC ATG ATC ATG CAC ACC CCC GGG TGC GTG CCC TGC GTT 117 

CGG GAG GAC AAC TCC TCT CGC TGC TGG GTA GCG CTC ACC 156 

CCC ACG CTC GCG GCT AGG AAT GGC AAC GTC CCC ACT ACG 195 

<■ GCG ATA CGA CGC CAC GTC GAT TTG CTC GTT GGG GCG GCT 234 

J GCT TTC TGC TCC GCC ATG TAC GTG GGG GAT CTC TGC GGA 273 

TCT GTT TTC CTC ATC TCC CAG CTG TTC ACC CTC TCG CCT 312 

.CGC CGG CAT GAG ACG GTA CAG GAG TGT AAT TGC TCA ATC 351 

TAT CCC GGC CAC GTG ACA GGT CAC CGT ATG GCT TGG GAT 390 

ATG ATG ATG AAC TGG TCA CCT ACA ACA GCC TTA GTG GTA 429 

TCG CAG TTA CTC CGG ATC CCA CAA GCT GTC ATG GAC ATG 468 

GTG GCG GGG GCC CAC TGG GGG GTC CTG GCG GGC CTC GCC 507 

10 TAC TAT TCC ATG GTG GGG AAC TGG GCT AAG GTT TTG ATT 546 

GTG ATG CTA CTC TTT GCT GGC GTT GAC GGC 576 


(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: D3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 


TAT GAA GTG CGC AAC GTG TCC GGG GTG TAC CAA GTC ACC 39 

AAT GAC TGT TCC AAC TCG AGC ATC GTG TAT GAG ACA GCG 78 

GAC ATG ATC ATG CAC ACC CCC GGG TGC GTG CCC TGC GTT 117 

CGG GAG GAC AAC TCC TCT CGC TGC TGG GTA GCG CTC ACC 156 

CCC ACG CTC GCG GCT AGG AAT AGC AGC GTC CCC ACT ACG 195 

25 ACA ATA CGA CGC CAC GTC GAT TTG CTC GTT GGG GCG GCT 234 

GCT TTC TGC TCC GCC ATG TAC GTG GGG GAT CTT TGC GGA 273 

TCT GTT TTC CTC GTC TCC CAG CTG TTC ACC TTC TCG CCT 312 

CGC CGG CAT GAG ACA GTA CAG GAA TGT AAC TGC TCA ATC 351 

TAT CCC GGC CAC GTG ACA GGT CAC CGC ATG GCT TGG GAT 390 

ATG ATG ATG AAC TGG TCG CCT ACA GCA GCC CTA GTG GTA 429 

TCG CAG TTA CTC CGG ATC CCA CAA GCT GTC GTG GAC ATG 468 

30 GTG GCG GGG GCC CAC TGG GGG GTC CTG GCG GGC CTC GCC 507 

TAC TAT TCC ATG GTG GGG AAC TGG GCT AAG GTT TTG ATT 546 

GTG ATG CTA CTC TTT GCT GGC GTC GAC GGC 576 


35 


(2) INFORMATION FOR SEQ ID NO:ll: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 


TAT GAA GTG CGC AAC GTG TCC GGG GTG TAC CAC GTC ACA 39 

AAC GAC TGC TCC AAC TCA AGC ATC GTG TAT GAG GCA GTG 78 

GAC GTG ATC ATG CAT ACC CCA GGG TGC GTG CCC TGC GTT 117 

10 CGG GAG AAC AAC CAC TCC CGT TGC TGG GTA GCG CTC ACC 156 

CCC ACG CTC GCG GCC AGG AAC GCC AGC ATC CCC ACT ACG 195 

ACA ATA CGA CGC CAT GTC GAT TTG CTC GTT GGG GCG GCT 234 

GCT TTC TGC TCC GCT ATG TAC GTG GGG GAC CTC TGC GGA 273 

TCC GTT TTC CTC GTC TCT CAG CTG TTC ACC TTT TCA CCT 312 

CGC CGG CAT GAG ACA GCA CAG GAC TGC AAC TGC TCA ATC 351 

TAT CCC GGC CAC GTT TCA GGT CAC CGC ATG GCT TGG GAT 390 

ATG ATG ATG AAC TGG TCA CCT ACA ACA GCC CTA GTG CTA 429 

15 TCG CAG TTA CTC CGA ATC CCA CAA GCT GTC GTG GAC ATG 468 

GTG GCG GGG GCC CAC TGG GGA GTC CTG GCG GGC CTC GCC 507 

TAC TAC TCC ATG GCG GGG AAC TGG GCC AAG GTT TTA ATT 546 

GTG TTG CTA CTC TTT GCC GGC GTT GAT GGG 576 


(2) INFORMATION FOR SEQ ID NO:12: 

20 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

25 (A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: HK3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 


TAT GAA GTG CGC AAC GTG TCC GGG ATA TAC CAT GTC ACG 39 

AAC GAC TGC TCC AAC TCA AGC GTC GTG TAT GAG ACA GCA 78 

GAC ATG ATC ATG CAT ACC CCT GGA TGC GTG CCC TGC GTA 117 

CGG GAG AAC AAC TCC TCC CGC TGT TGG GTA GCG CTC ACT 156 

CCC ACG CTC GCG GCC AGG AAC GTC AGC GTC CCC ACC ACG 195 

ACA ATA CGA CGT CAC GTC GAC TTG CTC GTT GGG GCG GCT 234 

GCC TTC TGC TCC GCT ATG TAC GTG GGG GAT CTC TGC GGA 273 

TCT GTT TTC CTT GTC TCC CAG CTG TTC ACC TTC TCG CCT 312 

CGC CGA CAC GAG ACA GTA CAG GAC TGC AAC TGC TCA CTC 351 

TAT CCC GGC CAC GTA TCA GGT CAC CGC ATG GCT TGG GAT 390 

35 
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ATG ATG ATG AAC TGG TCC CCT ACA GCA GCC CTA GTG GTG 429 
TCG CAA TTA CTC CGG ATC CCG CAA GCT GTC GTG GAC ATG 468 
GTG GCG GGG GCC CAC TGG GGA GTC CTA GCG GGC CTT GCC 507 
TAC TAT TCC ATG GTG GGA AAC TGG GCT AAG GTT TTG ATT 546 
GTG ATG CTA CTT TTT GCC GGC GTT GAT GGG 576 


(2) INFORMATION FOR SEQ ID NO: 13: 

( i ) SEQUENCE CHARACTER I ST I CS : 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: HK4 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 


CAT GAA GTG CAC AAC GTA TCC GGG ATC TAC CAT GTC ACG 39 
AAC GAC TGC TCC AAC TCA AGT ATT GTG TAT GAG GCA GCG 78 
GAC ATG ATC ATG CAT ACC CCC GGG TGC GTG CCC TGC GTC 117 
CGG GAG AAC AAC TCC TCC CGT TGC TGG GTA GCG CTC ACT 156 
CCC ACG CTC GCG GCC AGG AAC GCC AGC ATC CCC ACT ACG 195 
ACA ATA CGA CGC CAT GTC GAC TTG CTC GTT GGG GCG GCT 234 
GCT TTC TGC TCC GCC ATG TAC GTG GGA GAT CTC TGC GGA 273 
TCT GTC TTC CTC GTC TCC CAG TTG TTC ACC TTC TCG CCT 312 
CGC CGG CAT GAG ACG GTA CAG GAC TGC AAT TGC TCA ATC 351 
TAT CCC GGC CAC GTA TCA GGT CAC CGC ATG GCT TGG GAT 390 
ATG ATG ATG AAC TGG TCA CCT ACA GCA GCC CTA GTG GTA 429 
TCG CAG TTA CTC CGA CTC CCA CAA GCT GTC ATG GAC ATG 468 
GTG GCG GGA GCC CAC TGG GGA GTC CTA GCG GGC CTT GCT 507 
TAC TAT TCC ATG GTG GGG AAC TGG GCC AAG GTT TTG ATT 546 
GTG ATG CTA CTC TTT GCC GGC GTT GAC GGG 576 


(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: HK5 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: 
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TAT GAA GTG CGC AAC GTG TCC GGG GTA TAC CAT GTC ACG 39 

AAC GAC TGC TCC AAC TTA AGC ATC GTG TAC GAG ACA ACG 78 

GAC ATG ATC ATG CAC ACC CCT GGG TGC GTG CCC TGC GTT 117 

CGG GAA AAC AAC TCC TCC CGT TGT TGG GTA GCG CTC GCC 156 

CCC ACG CTC GCG GCC AGG AAC GCC AGC GTC CCC ACC ACG 195 

GCA ATA CGA CGC CAC GTC GAC TTG CTC GTT GGG GCG GCT 234 

GCT TTC TGC TCC GCT ATG TAC GTG GGG GAT CTT TGC GGA 273 

5 TCT GTT TTC CTC GTC TCC CAG CTG TTC ACC TTC TCG CCT 312 

CGC CGA CAC GAG ACG GTA CAG GAC TGC AAC TGC TCA ATC 351 

TAT CCC GGC CAC GTA ACA GGT CAC CGC ATG GCT TGG GAT 390 

ATG ATG ATG AAC TGG TCA CCT ACA ACA GCC CTA GTG GTG 429 

TCG CAG TTA CTC CGG ATC CCG CAA GCT GTC GTG GAC ATG 468 

GTA GCG GGG GCC CAC TGG GGG GTC CTG GCG GGC CTT GCC 507 

TAC TAT TCC ATG GTG GGA AAC TGG GCT AAG GTT TTG ATT 546 

10 GTG ATG CTA CTT TTT GCC GGC GTT GAT GGG 576 


(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

15 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: HK8 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 


TAT GAA GTG CGC AAC GTG TCC GGG ATA TAC CAT GTC ACG 39 

AAC GAC TGC TCC AAC TCA AGC ATC GTG TAT GAA ACA GCG 78 

GAC ATG ATT ATG CAT ACC CCT GGA TGC ATG CCC TGC GTT 117 

CGG GAG AAC AAC TCC TCC CGT TGC TGG GTG GCG CTC ACT 156 

CCC ACG CTC GCG GCT AGG AAT GTC AGC GTC CCC ACT ACG 195 

ACA ATA CGA CGC CAC GTC GAC TTG CTC GTT GGG GCG GCT 234 

25 GCT TTC TGC TCC GCT ATG TAC GTG GGG GAT CTC TGC GGA 273 

TCT GTT TTC CTC GTC TCC CAG CTG TTC ACC TTT TCG CCT 312 

CGC CGA CAC GAG ACG GTA CAG GAC TGC AAC TGC TCA ATC 351 

TAT CCC GGC CAC GTA TCA GGT CAC CGC ATG GCT TGG GAT 390 

ATG ATG ATG AAC TGG TCG CCC ACA ACA GCC CTA GTG GTG 429 

TCG CAG TTA CTC CGG ATC CCG CAA GCT ATC GTG GAC ATG 468 

GTG GCG GGG GCC CAC TGG GGA GTC CTA GCG GGC CTT GCC 507 

TAC TAT TCC ATG GTG GGC AAC TGG GCT AAG GTT TTG ATT 546 

iU GTG ATG CTA CTG TTT GCC GGC GTT GAT GGG 576 


35 


(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


5 


10 


15 


20 


25 


30 


35 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: IND5 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: 


TAT GAA GTG CGC AAC GTG TCC GGG GTG TAC CAT GTC ACG 39 
AAC GAC TGC TCC AAC TCA AGT ATT GTG TAT GAG GCA GCG 78 
GAC ATG ATC ATG CAC ACT CCC GGG TGC GTG CCC TGC GTT 117 
CGG GAG GGC AAC TCC TCT CGC TGC TGG GTA GCG CTC ACT 156 
CCC ACT CTC GCG GCC AGG AAC GCC AGC GTC TCC ACC ACG 195 
ACA ATA CGA CAC CAC GTC GAT TTG CTC GTT GGG GCG GCT 234 
GCT TTC TGT TCC GCT ATG TAC GTG GGG GAT CTA TGC GGA 273 
TCT GTT TTC CTC GTC TCC CAG CTG TTC ACC TTC TCA CCG 312 
CGC CGG CAT GAG ACA GTA CAG GAC TGC AAT TGC TCC ATC 351 
TAT CCC GGC CAC GTA TCA GGT CAC CGC ATG GCC TGG GAT 390 
ATG ATG ATG AAC TGG TCA CCT ACA GCA GCC CTA GTG GTA 429 
TCG CAG TTG CTC CGG ATC CCA CAA GCT GTC GTG GAT ATG 468 
GTG GCG GGG GCC CAC TGG GGA ATC CTG GCG GGC CTT GCC 507 
TAC TAT TCC ATG GTA GGG AAC TGG GCT AAG GTT TTG ATT 546 
GTG ATG CTA CTC TTT GCC GGC GTT GAC GGG 576 


(2) 

INFORMATION FOR SEQ 

ID 

NO: 17 : 







(i) 


SEQUENCE 

CHARACTERISTICS : 








(A) 

LENGTH : 

576 

base pairs 







(B) 

TYPE 

!: nucleic acid 








(C) 

STRANDEDNESS 

: single 







(D) 

TOPOLOGY: linear 






(vi) 

ORIGINAL 

SOURCE : 










(A) 

ORGAN ISM: homo s ap i e 

ns 







(C) 

INDIVIDUAL ISOLATE: 

IND8 




(xi) 

SEQUENCE 

DESCRIPTION: 

SEQ 

i ID 

NO: 17 : 


TAT 

GAG 

GTG 

CGC AAC 

GTG 

TCC 

GGG 

GTG 

TAC 

CAT 

GTC 

ACG 

39 

AAC 

GAC 

TGC 

TCC AAC 

TCA 

AGT 

ATT 

GTG 

TAT 

GAG 

GCA 

GCG 

78 

GAC 

ATG 

ATC 

ATG CAC 

ACC 

CCC 

GGG 

TGC 

GTG 

CCC 

TGC 

GTT 

117 

CGG 

GAG 

GGC 

AAC TTC 

TCT 

AGT 

TGC 

TGG 

GTA 

GCG 

CTC 

ACT 

156 

CCC 

ACT 

CTC 

GCG GCT 

AGG 

AAC 

GCC 

AGC 

GTC 

CCC 

ACC 

ACG 

195 

ACA 

ATA 

CGA 

CGC CAC 

GTC 

GAT 

TTG 

CTC 

GTT 

GGG 

GCG 

GCT 

234 

GCT 

TTC 

TGT 

TCC GCT 

ATG 

TAC 

GTG 

GGG 

GAT 

CTC 

TGC 

GGA 

273 

TCT 

GTT 

TTC 

CTT GTC 

TCC 

CAG 

CTG 

TTC 

ACC 

TTC 

TCA 

CCG 

312 

CGC 

CGG 

CAT 

GAG ACA 

GTA 

CAG 

GAC 

TGC 

AAT 

TGC 

TCC 

ATC 

351 

TAT 

CCC 

GGC 

CAC GTA 

TCA 

GGT 

CAC 

CGC 

ATG 

GCT 

TGG 

GAT 

390 

ATG 

ATG 

ATG 

AAC TGG 

TCA 

CCT 

ACA 

GCG 

GCC 

CTA 

GTG 

GTA 

429 
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TCG CAG TTG CTC CGG ATC CCA CAA GCT GTC GTG GAT ATG 468 
GTG GCG GGG GCC CAC TGG GGA ATC CTG GCG GGC CTT GCC 507 
TAC TAT TCC ATG GTA GGG AAC TGG GCT AAG GTT TTG ATT 546 
GTG ATG CTA CTC TTT GCC GGC GTT GAC GGG 576 


(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: P10 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 


TAT GAA GTG CGC AAC GTG TCC GGG GTG TAC CAT GTC ACG 39 
AAC GAC TGC TCC AAC TCA AGT ATT GTG TAT GAG GCA GCG 78 
GAC ATG ATA ATG CAC ACC CCC GGG TGC GTG CCC TGT GTT 117 
CGG GAG AAC AAC TCC TCC CGC TGC TGG GTA GCG CTC ACT 156 
CCC ACA CTC GCG GCT AGG AAT TCC AGC GTC CCA ACT ACG 195 
GCA ATA CGA CGC CAT GTC GAT TTG CTC GTT GGG GCG GCT 234 
GCT TTC TGC TCC GCT ATG TAC GTG GGG GAT CTC TGC GGA 273 
TCT GTT CTC CTC GTC TCC CAG CTG TTC ACC TTC TCA CCT 312 
CGC CGG CAT TGG ACA GTA CAG GAC TGC AAT TGT TCA ATC 351 
TAT CCT GGC CAC GTA TCA GGT CAC CGC ATG GCT TGG GAT 390 
ATG ATG ATG AAC TGG TCG CCC ACA GCA GCC CTA GTG GTG 429 
TCG CAG CTA CTC CGG ATC CCA CAA GCT ATC TTG GAT GTG 468 
GTG GCG GGG GCC CAC TGG GGA GTC CTG GCG GGC CTT GCC 507 
TAC TAT TCC ATG GTG GGG AAC TGG GCT AAG GTC TTG ATT 546 
GTG ATG CTA CTC TTT GCC GGC GTT GAC GGA 576 


(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S9 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

TAT GAA GTG CGC AAC GTA TCC GGG GCG TAC CAT GTC ACG 39 
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AAC GAC TGC TCC AAC TCA AGT ATT GTG TAC GAG GCA GCG 
GAC GTG ATC ATG CAT ACC CCC GGG TGT GTA CCC TGC GTT 
CAG GAG GGT AAC TCC TCC CAA TGC TGG GTG GCG CTC ACC 
CCC ACG CTC GCG GCC AGG AAC GCT ACC GTC CCC ACC ACG 
ACA ATA CGA CGT CAT GTC GAT TTG CTC GTT GGG GCG GCT 
GTT TTC TGC TCC GCT ATG TAC GTG GGG GAC CTG TGC GGA 
TCT GTT TTC CTC ATC TCC CAG CTG TTC ACC ATC TCG CCC 
CGT CGG CAT GAG ACA GTA CAG AAC TGC AAT TGC TCA ATC 
TAT CCC GGA CAC GTG ACA GGT CAT CGC ATG GCC TGG GAT 
ATG ATG ATG AAC TGG TCG CCT ACA ACA GCC CTA GTG GTA 
TCG CAG CTA CTC CGG ATC CCA CAA GCT GTC ATG GAT ATG 
GTG GCG GGG GCC CAC TGG GGA GTC CTG GCG GGC CTC GCC 
TAC TAT TCC ATG GTG GGG AAC TGG GCT AAG GTT TTG ATT 
GTG ATG CTA CTT TTT GCT GGT GTT GAC GGG 


(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


78 

117 

156 

195 

234 

273 

312 

351 

390 

429 

468 

507 

546 

576 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

20 TAT GAA GTG CGC AAC GTG TCC GGG GCG TAC CAT GTC ACG 

AAC GAC TGC TCC AAC TCA AGC ATT GTG TAT GAG GCA GTG 

GAC GTG ATC CTG CAC ACC CCT GGG TGC GTG CCC TGC GTT 

CGG GAG AAC AAC TCC TCC CGT TGC TGG GTG GCG CTC ACT 

CCC ACG CTC GCG GCC AGG AAC TCC AGC GTC CCC ACT ACG 

ACA ATA CGA CGT CAC GTC GAT TTG CTC GTT GGG GCG GCT 

GCT TTC TGC TCC GCT ATG TAC GTG GGG GAT CTC TGC GGA 

25 TCT GTT TTC CTT GTT TCC CAG CTG TTC ACC TTC TCG CCT 

CGT CGG CAT GAG ACA GTA CAG GAC TGC AAC TGT TCA ATC 

TAT CCC GGC CAC GTA ACA GGT CAC CGC ATG GCT TGG GAT 

ATG ATG ATG AAC TGG TCG CCT ACA GCA GCC TTA GTG GTA 

TCG CAG TTA CTC CGG ATC CCA CAA GCT GTC GTG GAC ATG 

GTG GCG GGG GCC CAC TGG GGA GTC CTG GCG GGC CTT GCC 

TAC TAT TCC ATG GTG GGG AAC TGG GCT AAG GTT CTG ATT 

, n GTG ATG CTA CTC TTT GCC GGC GTT GAC GGG 


39 

78 

117 

156 

195 

234 

273 

312 

351 

390 

429 

468 

507 

546 

576 


(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA10 

5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

TAT GAA GTG CGC AAC GTG TCC GGG ATG TAC CAT GTC ACG 

AAC GAC TGC TCC AAC TCA AGC ATT GTG TAT GAG GCA GCG 

GAC ATG ATC ATG CAC ACC CCC GGG TGC GTG CCC TGC GTT 

CGG GAG AAC AAC TCC TCC CGC TGC TGG GTA GCG CTC ACT 

CCC ACG CTC GCG GCC AGG AAC TCC AGC GTC CCC ACT ACG 

10 ACA ATA CGA CGC CAC GTC GAT TTG CTC GTT GGG GCG GCT 

GCT TTC TGC TCC GCC ATG TAC GTG GGG GAC CTC TGC GGA 

TCT GTT TTC CTT GTC TCC CAG CTG TTC ACC TTC TCG CCT 

CGC CGG TAT GAG ACA GTA CAG GAC TGC AAT TGC TCA ATC 

TAT CCC GGC CGC GTA ACA GGT CAC CGC ATG GCT TGG GAT 

ATG ATG ATG AAC TGG TCA CCT ACA ACA GCT CTA GTA GTA 

TCG CAG TTA CTC CGG ATC CCA CAA GCT ATC GTG GAC ATG 

GTG GCG GGG GCC CAC TGG GGA GTC CTA GCG GGC CTT GCC 

15 TAC TAT TCC ATG GTG GGG AAC TGG GCT AAG GTT TTG ATT 

GTT ATG CTA CTC TTT GCC GGC GTT GAC GGG 


39 

78 

117 

156 

195 

234 

273 

312 

351 

390 

429 

468 

507 

546 

576 


(2) INFORMATION FOR SEQ ID NO:22: 

SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SW2 

25 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

TAT GAA GTG CGC AAC GTG TCC GGG GTG TAT CAT GTC ACG 39 

AAC GAC TGT TCC AAC TCA AGC ATT GTG TAT GAG ACA GCG 78 

GAC ATG ATC ATG CAT ACC CCC GGG TGC GTG CCC TGC GTT 117 

CGG GAG GCC AAC TCC TCC CGC TGC TGG GTA GCG CTC ACT 156 

„ CCC ACG CTA GCA GCC AGG AAC ACC AGC GTC CCC ACT ACG 195 

30 ACA ATA CGA CGC CAC GTC GAT TTG CTC GTT GGG GCG GCT 234 

GCT TTC TGC TCC GTT ATG TAC GTG GGG GAT CTC TGC GGA 273 

TCT GTT TTC CTC GTC TCC CAG CTG TTC ACT TTT TCA CCT 312 

CGC CGG CAC GAG ACA GTA CAG GAC TGC AAC TGT TCC ATC 351 

TAT CCC GGC CAC GTA TCA GGT CAC CGC ATG GCT TGG GAC 390 

ATG ATG ATG AAC TGG TCA CCT ACA GCA GCC CTG GTG GTA 429 

TCG CAG TTA CTC CGG ATC CCA CAA GCT GTC GTG GAC ATG 468 
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GTA GCG GGG GCC CAC TGG GGA GTC CTG GCG GGC CTT GCA 
TAC TAT TCC ATG GTG GGG AAC TGG GCT AAG GTT TTG ATT 
GTG ATG CTA CTC TTT GCT GGC GTT GAC GGG 


(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


10 


15 


20 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: horaosapiens 

(C) INDIVIDUAL ISOLATE: T3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

TAC GAA GTG CGC AAC GTG TCC GGG GTG TAC TAT GTC ACG 

AAC GAC TGT TCC AAC TCA AGC ATT GTG TAT GAG ACA GCG 

GAC ATG ATC ATG CAC ACC CCT GGG TGC GTG CCC TGC GTT 

CGG GAG AGC AAT TCC TCC CGC TGC TGG GTA GCG CTT ACT 

CCC ACG CTC GCG GCC AGG AAC GCC AGC GTC CCC ACT AAG 

ACA ATA CGA CGT CAC GTC GAC TTG CTC GTT GGG GCG GCT 

GCT TTC TGT TCC GCT ATG TAC GTG GGG GAT CTC TGC GGA 

TCT GTT TTC CTC GTC TCC CAG CTG TTC ACT TTC TCG CCT 

CGC CGG CAT GAG ACA GTA CAG GAC TGC AAC TGC TCA ATC 

TAT CCC GGC CAC GTA ACA GGT CAC CGT ATG GCT TGG GAT 

ATG ATG ATG AAC TGG TCG CCC ACA ACG GCA CTA GTG GTG 

TCG CAG TTG CTC CGG ATC CCA CAA GCT GTC GTG GAC ATG 

GTG GCG GGG GCC CAC TGG GGA GTC CTG GCG GGC CTT GCC 

TAC TAT TCC ATG GTG GGG AAC TGG GCT AAG GTT TTG ATT 

GTG CTG CTA CTC TTT GCC GGC GTT GAT GGG 


25 


30 


(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T10 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 


TAT GAA GTG CGC AAC GTG TCC GGG ATG TAC CAT GTC ACG 
AAC GAC TGC TCC AAC TCA AGC ATT GTG TTT GAG GCA GCG 


507 

546 

576 


39 

78 

117 

156 

195 

234 

273 

312 

351 

390 

429 

468 

507 

546 

576 


39 

78 


35 
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GAC TTG ATC ATG CAC ACC CCC GGG TGC GTG CCC TGC GTT 117 
CGG GAG GGC AAC TCC TCC CGC TGC TGG GTA GCG CTC ACT 156 
CCC ACG CTC GCG GCC AGG AAC ACC AGC GTC CCC ACT ACG 195 
ACG ATA CGA CGC CAT GTC GAT TTG CTC GTT GGG GCG GCT 234 
GCT TTC TGC TCC GCT ATG TAT GTG GGA GAC CTC TGC GGA 273 
TCT GTT TTC CTC GTC TCT CAG CTG TTC ACC TTC TCG CCT 312 
CGC CGG CAT GAG ACT TTG CAG GAC TGC AAC TGC TCA ATC 351 
TAT CCC GGC CAT CTG TCA GGT CAC CGC ATG GCT TGG GAC 390 
ATG ATG ATG AAC TGG TCG CCT ACA ACA GCT CTA GTG GTG 429 
TCG CAG TTA CTC CGG ATC CCA CAA GCT GTC ATG GAC ATG 468 
GTG ACA GGG GCC CAC TGG GGA GTC CTG GCG GGC CTT GCC 507 
TAC TAT TCC ATG GCG GGG AAC TGG GCT AAG GTT TTA ATT 546 
GTG ATG CTA CTC TTT GCC GGC GTT GAT GGG 576 


(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

15 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: US6 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 


9n TAT GAA GTG CGC AAC GTG TCC GGG ATG TAC CAT GTC ACG 39 

AAC GAC TGC TCC AAC TCA AGC ATT GTG TAT GAG GCA GCG 78 

GAC ATG ATC ATG CAC ACT CCC GGG TGC GTG CCC TGT GTT 117 

CGG GAG AAC AAT TCC TCC CGC TGC TGG GTA GCG CTC ACT 156 

CCC ACG CTC GCG GCC AGG AAC GCT AGC GTC CCC ACT ACG 195 

ACA ATA CGA CGC CAC GTC GAT TTG CTC GTT GGG GCG GCT 234 

ACT TTC TGC TCC GCT ATG TAC GTG GGG GAC CTC TGC GGG 273 

TCC GTT TTC CTC ATC TCC CAG CTG TTC ACC TTC TCG CCT 312 

25 CGT CAG CAT GAG ACA GTA CAG GAC TGC AAT TGT TCA ATC 351 

TAT CCC GGC CAC GTA TCA GGT CAC CGC ATG GCT TGG GAT 390 

ATG ATG ATG AAT TGG TCA CCT ACA GCA GCC CTA GTG GTA 429 

TCG CAG TTA CTC CGG ATC CCA CAA GCT GTC ATG GAC ATG 468 

GTG GCG GGG GCC CAC TGG GGA GTC CTG GCG GGC CTT GCC 507 

TAC TAT TCC ATG GTG GGG AAC TGG GCT AAG GTT CTG ATT 546 

GTG TTG CTA CTC TTT GCC GGC GTT GAC GGG 576 


(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 


5 


10 


15 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

GCC CAA GTG AGG AAC ACC AGC CGC GGT TAC ATG GTG ACT 

AAC GAC TGT TCC AAT GAG AGC ATC ACC TGG CAG CTC CAA 

GCC GCG GTT CTC CAC GTC CCC GGG TGT ATC CCG TGT GAG 

AGG CTG GGA AAT ACA TCC CGA TGC TGG ATA CCG GTC ACA 

CCA AAC GTG GCC GTG CGG CAG CCC GGC GCT CTT ACG CAG 

GGC TTG CGG ACG CAC ATC GAC ATG GTT GTG ATG TCC GCC 

ACG CTC TGC TCT GCC CTC TAC GTG GGG GAC CTC TGC GGC 

GGG GTG ATG CTC GCA GCC CAG ATG TTC ATT GTC TCG CCG 

CGA CGC CAC TGG TTT GTG CAA GAA TGC AAT TGC TCC ATC 

TAC CCC GGT ACC ATC ACT GGA CAC CGT ATG GCA TGG GAC 

ATG ATG ATG AAC TGG TCG CCC ACA GCC ACC ATG ATC CTG 

GCG TAC GCG ATG CGC GTT CCC GAG GTC ATC ATA GAC ATC 

ATC GGC GGG GCT CAC TGG GGC GTC ATG TTT GGC TTG GCC 

TAC TTC TCT ATG CAG GGA GCG TGG GCG AAG GTC ATT GTC 

ATC CTC TTG CTG GCT GCT GGG GTG GAC GCG 


(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

<C) INDIVIDUAL ISOLATE: T4 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

GCA CAA GTG AAG AAC ACC ACT AAC AGC TAC ATG GTG ACC 

AAC GAC TGT TCT AAT GAC AGC ATC ACT TGG CAG CTC CAG 

GCC GCG GTC CTC CAC GTC CCC GGG TGT GTC CCG TGC GAG 

AAA ACG GGA AAT ACA TCT CGG TGC TGG ATA CCG GTT TCA 

CCA AAC GTG GCC GTG CGG CAG CCC GGC GCC CTC ACG CAG 

GGC TTG CGG ACG CAC ATT GAC ATG GTT GTG ATG TCC GCC 

JU ACG CTC TGC TCT GCT CTT TAC GTG GGG GAC CTC TGC GGC 

GGG GTG ATG CTC GCA GCC CAG ATG TTC ATC GTC TCG CCG 

CAA CAT CAC TGG TTT GTG CAA GAC TGC AAT TGC TCT ATC 

TAC CCT GGC ACC ATC ACT GGA CAC CGT ATG GCA TGG GAT 

ATG ATG ATG AAC TGG TCG CCC ACG GCC ACC ATG ATC CTG 

GCG TAC GCG ATG CGC GTT CCC GAG GTC ATC TTA GAC ATC 

GTT AGC GGG GCA CAC TGG GGC GTC ATG TTC GGC TTG GCC 

35 


39 

78 

117 

156 

195 

234 

273 

312 

351 

390 

429 

468 

507 

546 

576 


39 

78 

117 

156 

195 

234 

273 

312 

351 

390 

429 

468 

507 
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TAC TTC TCT ATG CAG GGA GCG TGG GCG AAA GTC GTT GTC 
ATC CTT CTG CTG GCC GCT GGG GTG GAC GCG 


(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


10 


15 


20 


25 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T9 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

GCC GAA GTG AAG AAC ACC AGT ACC AGC TAC ATG GTG ACA 

AAT GAC TGT TCC AAC GAC AGC ATC ACC TGG CAA CTC CAG 

GCC GCG GTC CTC CAC GTC CCC GGG TGC GTC CCG TGC GAG 

AGA GTT GGA AAC GCG TCG CGG TGC TGG ATA CCG GTC TCG 

CCA AAC GTA GCT GTG CAG CGG CCT GGC GCC CTC ACG CAG 

GGC TTG CGG ACG CAC ATC GAC ATG GTT GTG ATG TCC GCC 

ACG CTC TGC TCC GCT CTC TAC GTG GGG GAT CTC TGC GGC 

GGG GTA ATG CTC GCC GCT CAG ATG TTC ATT ATC TCG CCG 

CAG CAC CAC TGG TTT GTG CAG GAA TGC AAC TGC TCC ATT 

TAC CCT GGT ACC ATC ACT GGA CAC CGT ATG GCA TGG GAC 

ATG ATG ATG AAC TGG TCG CCC ACA ACC ACC ATG ATC TTG 

GCG TAC GCG ATG CGC GTT CCC GAG GTC ATC ATA GAC ATC 

ATC AGC GGA GCT CAC TGG GGC GTC ATG TTC GGC CTA GCC 

TAC TTC TCT ATG CAG GGA GCG TGG GCG AAG GTC GTT GTC 

ATC CTG TTG CTC ACC GCT GGC GTG GAC GCG 


(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: US10 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 9 : 


GTC CAA GTG AAA AAC ACC AGT ACC AGC TAT ATG GTG ACC 
AAT GAC TGC TCC AAC GAC AGC ATC ACT TGG CAA CTT GAG 
GCT GCG GTC CTC CAC GTT CCC GGG TGT GTC CCG TGC GAG 


546 

576 


39 

78 

117 

156 

195 

234 

273 

312 

351 

390 

429 

468 

507 

546 

576 


39 

78 

117 
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AAA GTG GGA AAT ACA TCT CGG TGC TGG ATA CCG GTC TCA 156 
CCA AAT GTG GCC GTG CAG CGG CCT GGC GCC CTC ACG CAG 195 
GGC TTG CGG ACT CAC ATC GAC ATG GTC GTG ATG TCC GCC 234 
ACG CTC TGC TCC GCT CTT TAC GTG GGG GAC TTC TGC GGT 273 
GGG ATG ATG CTC GCA GCC CAA ATG TTC ATT GTC TCG CCG 312 
CGC CAC CAC TCG TTT GTG CAG GAA TGC AAC TGC TCC ATC 351 
TAC CCC GGT ACC ATC ACC GGG CAC CGT ATG GCA TGG GAC 390 
ATG ATG ATG AAC TGG TCG CCC ACG GCC ACT TTG ATC CTG 429 
GCG TAC GTG ATG CGC GTT CCC GAG GTC ATC ATA GAC ATC 468 
ATT AGC GGG GCG CAT TGG GGC GTC TTG TTC GGC TTA GCC 507 
TAC TTC TCT ATG CAG GGA GCG TGG GCG AAA GTC GTT GTC 546 
ATC CTT CTG CTA GCC GCT GGG GTG GAC GCG 576 


10 


15 


20 


25 


(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK8 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 0 : 


GTG GAA GTC AGG AAC ATC AGT TCC AGC TAC TAC GCC ACC 39 
AAT GAT TGC TCA AAC AAC AGC ATC ACC TGG CAA CTC ACC 78 
GAC GCA GTT CTC CAC CTT CCC GGA TGC GTC CCA TGT GAG 117 
AAT GAC AAT GGC ACC CTG CGC TGC TGG ATA CAA GTG ACA 156 
CCT AAT GTG GCT GTG AAA CAC CGC GGC GCA CTT ACT CAT 195 
AAC CTG CGA ACA CAC GTC GAC GTG ATC GTA ATG GCA GCT 234 
ACG GTC TGC TCG GCC TTG TAT GTG GGA GAC GTA TGC GGG 273 
GCC GTG ATG ATC GTG TCG CAG GCT CTC ATA ATA TCG CCT 312 
GAA CGC CAC AAC TTT ACC CAG GAG TGC AAC TGT TCC ATC 351 
TAC CAA GGT CAT ATC ACC GGC CAC CGC ATG GCA TGG GAC 390 
ATG ATG CTA AAC TGG TCA CCA ACT CTT ACC ATG ATC CTC 429 
GCC TAT GCC GCT CGT GTT CCT GAG CTA GCC CTC CAG GTT 468 
GTC TTC GGC GGC CAT TGG GGC GTG GTG TTT GGC TTG GCC 507 
TAT TTC TCC ATG CAG GGA GCG TGG GCC AAA GTC ATT GCC 546 
ATC CTC CTT CTT GTC GCA GGA GTG GAT GCA 576 


(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


35 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK11 


5 


10 


15 


20 


25 


30 


35 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

GTG GAA GTC AGG AAC ACC AGT TCT AGT TAC TAC GCC ACC 
AAT GAT TGC TCA AAC AAC AGC ATC ACC TGG CAA CTC ACC 
AAC GCA GTT CTC CAC CTT CCC GGA TGC GTC CCA TGT GAG 
AAT GAC AAT GGC ACC CTG CAC TGC TGG ATA CAA GTG ACA 
CCT AAT GTG GCT GTG AAA CAC CGC GGC GCA CTC ACT CAC 
AAC CTG CGA GCA CAT ATA GAT ATG ATT GTA ATG GCA GCT 
ACG GTC TGC TCG GCC TTG TAT GTG GGA GAC GTG TGC GGG 
GCC GTG ATG ATC GTG TCG CAG GCT TTC ATA GTA TCG CCA 
GAA CAC CAC CAC TTT ACC CAA GAG TGC AAC TGT TCC ATC 
TAC CAA GGT CAC ATC ACC GGC CAC CGC ATG GCA TGG GAC 
ATG ATG CTT AAC TGG TCA CCA ACT CTC ACC ATG ATC CTC 
GCC TAT GCC GCC CGT GTT CCT GAG CTA GTC CTT GAA GTC 
GTC TTC GGT GGT CAT TGG GGT GTG GTG TTT GGC TTG GCC 
TAT TTC TCC ATG CAG GGA GCG TGG GCC AAG GTC ATT GCC 
ATC CTC CTT CTT GTA GCA GGA GTG GAT GCA 


(2) INFORMATION FOR SEQ ID NO:32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SW3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

GTG GAA GTC AGG AAC ATC AGT TCT AGC TAC TAT GCC ACC 

AAT GAT TGC TCA AAC AGC AGC ATC ACC TGG CAA CTC ACC 

AAC GCA GTC CTC CAC CTT CCC GGA TGC GTC CCG TGT GAG 

AAT GAT AAT GGC ACC CTG CAC TGC TGG ATA CAA GTG ACA 

CCT AAT GTG GCT GTG AAA CAC CGC GGC GCG CTC ACT CAC 

AAC CTG CGA GCA CAC GTC GAT ATG ATC GTA ATG GCA GCT 

ACG GTC TGC TCG GCC TTG TAT GTG GGA GAC ATG TGC GGG 

GCC GTG ATG ATC GTG TCG CAG GCT TTC ATA ATA TCG CCA 

GAA CGC CAC AAC TTT ACC CAA GAG TGC AAC TGT TCC ATC 

TAC CAA GGT CGT ATC ACC GGC CAC CGC ATG GCG TGG GAC 

ATG ATG CTA AAC TGG TCA CCA ACT CTT ACC ATG ATC CTT 

GCC TAT GCC GCT CGT GTT CCT GAG CTA GTC CTT GAA GTT 

GTC TTC GGC GGC CAT TGG GGC GTG GTG TTT GGC TTG GCC 

TAT TTC TCC ATG CAA GGA GCG TGG GCC AAG GTC ATT GCC 

ATC CTC CTG CTT GTC GCA GGA GTG GAT GCA 


39 

78 

117 

156 

195 

234 

273 

312 

351 

390 

429 

468 

507 

546 

576 


39 

78 

117 

156 

195 

234 

273 

312 

351 

390 

429 

468 

507 

546 

576 
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5 


10 


15 


20 


25 


30 


35 


(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T8 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 

GTG GAA GTT AGA AAC ACC AGT TTT AGC TAC TAC GCC ACC 

AAT GAT TGC TCG AAC AAC AGC ATC ACC TGG CAG CTC ACC 

AAC GCA GTT CTC CAC CTT CCC GGA TGC GTC CCA TGT GAG 

AAT GAC AAT GGC ACC TTG CGC TGC TGG ATA CAA GTA ACA 

CCT AAT GTG GCT GTG AAA CAC CGT GGC GCA CTC ACT CAC 

AAC CTG CGA ACG CAT GTC GAC GTG ATC GTA ATG GCA GCT 

ACG GTC TGC TCG GCC TTG TAT GTG GGG GAC GTG TGC GGG 

GCC GTG ATG ATA GCG TCG CAG GCT TTC ATA ATA TCG CCA 

GAA CGC CAC AAC TTC ACC CAG GAG TGC AAC TGT TCC ATC 

TAC CAA GGT CAT ATC ACC GGC CAC CGC ATG GCA TGG GAC 

ATG ATG CTG AAC TGG TCA CCA ACT CTC ACC ATG ATC CTC 

GCC TAC GCT GCT CGT GTG CCT GAA CTA GTC CTT GAA GTT 

GTC TTC GGC GGC CAT TGG GGC GTG GTG TTT GGC TTG GCC 

TAT TTC TCC ATG CAA GGA GCG TGG GCC AAA GTC ATC GCC 

ATC CTC CTC CTT GTC GCA GGA GTG GAC GCA 


(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S83 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

GTG GAG GTC AAG GAC ACC GGC GAC TCC TAC ATG CCG ACC 

AAC GAT TGC TCC AAC TCT AGT ATC GTT TGG CAG CTT GAA 

GGA GCA GTG CTT CAT ACT CCT GGA TGC GTC CCT TGT GAG 

CGT ACC GCC AAC GTC TCT CGA TGT TGG GTG CCG GTT GCC 

CCC AAT CTC GCC ATA AGT CAA CCT GGC GCT CTC ACT AAG 


39 

78 

117 

156 

195 

234 

273 

312 

351 

390 

429 

468 

507 

546 

576 


39 

78 

117 

156 

195 
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GGC CTG CGA GCA CAC ATC GAT ATC ATC GTG ATG TCT GCT 
ACG GTC TGT TCT GCC CTT TAT GTG GGG GAC GTG TGT GGC 
GCG CTG ATG CTG GCC GCT CAG GTC GTC GTC GTG TCG CCA 
CAA CAC CAT ACG TTT GTC CAG GAA TGC AAC TGT TCC ATA 
TAC CCG GGC CGC ATT ACG GGA CAC CGC ATG GCT TGG GAT 
ATG ATG ATG AAC TGG TCG CCC ACT ACC ACC ATG CTC CTG 
GCG TAC TTG GTG CGC ATC CCG GAA GTC ATC TTG GAT ATT 
GTT ACA GGA GGT CAT TGG GGT GTA ATG TTT GGC CTC GCT 
TAC TTC TCC ATG CAG GGA TCG TGG GCG AAG GTC ATC GTT 
ATC CTC CTG CTG ACT GCT GGG GTG GAG GCG 


(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


15 


20 


25 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK12 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

TTA GAG TGG CGG AAT GTG TCC GGC CTC TAC GTC CTT ACC 

AAC GAC TGT TCC AAT AGC AGT ATC GTG TAT GAG GCC GAT 

GAC GTC ATT CTG CAC ACA CCT GGC TGT GTA CCT TGT GTT 

CAG GAC GGC AAT ACA TCT ACG TGC TGG ACC TCA GTG ACG 

CCT ACA GTG GCA GTC AGG TAC GTC GGA GCA ACC ACC GCT 

TCG ATA CGC AGT CAT GTG GAC CTG CTA GTG GGC GCG GCC 

ACG ATG TGC TCT GCG CTC TAC GTG GGT GAT GTG TGT GGG 

GCC GTC TTC CTT GTG GGA CAA GCC TTC ACG TTC AGA CCT 

CGT CGC CAT CAA ACA GTC CAG ACC TGT AAC TGC TCG CTG 

TAC CCA GGC CAT CTT TCA GGA CAT CGA ATG GCT TGG GAT 

ATG ATG ATG AAT TGG TCC CCC GCT GTG GGT ATG GTG GTA 

GCG CAC GTC CTG CGT CTG CCC CAG ACC TTG TTC GAC ATA 

ATA GCT GGG GCC CAT TGG GGC ATC ATG GCG GGC CTA GCC 

TAT TAC TCC ATG CAG GGC AAC TGG GCC AAG GTC GCT ATC 

ATC ATG GTT ATG TTT TCA GGA GTC GAT GCC 


(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


35 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 


234 

273 

312 

351 

390 

429 

468 

507 

546 

576 


39 

78 

117 

156 

195 

234 

273 

312 

351 

390 

429 

468 

507 

546 

576 


372577J 



102 


(C) INDIVIDUAL ISOLATE: HK10 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

CTA GAG TGG CGG AAT GTG TCT GGC CTC TAT GTC CTT ACC 
AAC GAC TGT CCC AAT AGC AGT ATT GTG TAT GAG GCC GAT 
GAC GTC ATT CTG CAC ACA CCT GGC TGT GTA CCT TGT GTT 
CAG GAC GGC AAT ACA TCC ACG TGC TGG ACC TCG GTG ACA 
CCT ACA GTG GCA GTC AGG TAC GTC GGA GCA ACC ACC GCC 
TCG ATA CGC AGT CAT GTG GAC CTG TTA GTG GGC GCG GCC 
ACG ATG TGC TCT GCG CTC TAC GTG GGC GAT ATG TGT GGG 
GCC GTC TTC CTC GTG GGA CAA GCC TTC ACG TTC AGA CCG 
CGT CGC CAT CAA ACG GTC CAG ACC TGT AAC TGC TCG CTG 
TAC CCA GGC CAC CTT TCA GGA CAT CGA ATG GCT TGG GAT 
ATG ATG ATG AAT TGG TCC CCC GCC GTG GGT ATG GTG GTG 
GCG CAC GTC CTG CGG TTG CCC CAG ACC TTG TTC GAC ATA 
ATA GCC GGG GCC CAT TGG GGC ATC TTG GCA GGC CTA GCC 
TAT TAC TCC ATG CAG GGC AAC TGG GCC AAG GTC GCT ATC 
ATC ATG GTT ATG TTT TCA GGG GTC GAT GCC 


15 


(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


20 


25 


30 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

CTA GAG TGG CGG AAT ACG TCT GGC CTC TAT GTC CTC ACC 

AAC GAC TGT TCC AAT AGC AGT ATT GTG TAT GAG GCC GAT 

GAC GTT ATT CTG CAC ACA CCT GGC TGT GTA CCT TGT GTT 

CAG GAC GGT AAT ACA TCC ACG TGC TGG ACC CCA GTG ACA 

CCT ACA GTG GCA GTC AGG TAT GTC GGA GCA ACC ACC GCT 

TCG ATA CGC AGT CAT GTG GAC CTA TTG GTG GGC GCG GCC 

ACT ATG TGC TCT GCG CTC TAC GTG GGT GAT ATG TGT GGG 

GCC GTC TTT CTC GTG GGA CAA GCC TTC ACG TTC AGA CCT 

CGT CGC CAT CAA ACG GTC CAG ACC TGT AAC TGC TCG CTG 

TAC CCA GGC CAT CTT TCA GGA CAT CGC ATG GCT TGG GAT 

ATG ATG ATG AAT TGG TCC CCC GCT GTG GGT ATG GTG GTG 

GCG CAC GTT CTG CGT TTG CCC CAG ACC GTG TTC GAC ATA 

ATA GCC GGG GCC CAT TGG GGC ATC TTG GCG GGC CTA GCC 

TAT TAC TCC ATG CAA GGC AAC TGG GCC AAG GTC GCT ATC 

ATC ATG GTT ATG TTT TCA GGG GTC GAC GCC 


35 


39 

78 

117 

156 

195 

234 

273 

312 

351 
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10 


15 


20 


25 


30 


35 


(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S52 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 


CTA GAG TGG CGG AAT ACG TCT GGC CTC TAT GTC CTT ACC 39 
AAC GAC TGT TCC AAT AGC AGT ATT GTG TAT GAG GCC GAT 78 
GAC GTC ATT CTG CAC ACA CCC GGC TGT GTA CCT TGT GTT 117 
CAG GAC GGC AAT ACA TCC ATG TGC TGG ACC CCA GTG ACA 156 
CCT ACG GTG GCA GTC AGG TAC GTC GGA GCA ACC ACC GCT 195 
TCG ATA CGC AGT CAT GTG GAC CTA TTA GTG GGC GCG GCC 234 
ACG CTG TGC TCT GCG CTC TAT GTG GGT GAT ATG TGT GGG 273 
GCC GTC TTT CTC GTG GGA CAA GCC TTC ACG TTC AGA CCT 312 
CGT CGC CAT CAA ACG GTC CAG ACC TGT AAC TGC TCG CTG 351 
TAC CCA GGC CAT GTT TCA GGA CAT CGA ATG GCT TGG GAT 390 
ATG ATG ATG AAT TGG TCC CCC GCT GTG GGT ATG GTG GTG 429 
GCG CAC ATC CTG CGA TTG CCC CAG ACC TTG TTT GAC ATA 468 
CTG GCC GGG GCC CAT TGG GGC ATC TTG GCG GGC CTA GCC 507 
TAT TAT TCT ATG CAG GGC AAC TGG GCC AAG GTC GCT ATT 546 
GTC ATG ATT ATG TTT TCA GGG GTC GAT GCC 576 


(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S54 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 9 : 


CTA GAG TGG CGG AAT ACG TCT GGC CTC TAT ATC CTT ACC 39 
AAC GAC TGT TCC AAT AGC AGT ATT GTG TAT GAG GCC GAT 78 
GAC GTC ATT CTG CAC ACA CCC GGC TGT GTA CCT TGT GTT 117 
CAG GAC GGC AAT ACA TCC ACG TGC TGG ACC CCA GTG ACA 156 
CCT ACG GTG GCA GTC AGG TAC GTC GGA GCA ACC ACC GCT 195 
TCG ATA CGC AGT CAT GTG GAC CTA TTA GTG GGC GCG GCC 234 
ACG CTG TGC TCT GCG CTC TAT GTG GGT GAT ATG TGT GGG 273 
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GCC GTC TTT CTC GTG GGA CAA GCC TTC ACG TTC AGA CCT 
CGT CGC CAT CAA ACG GTC CAG ACC TGT AAC TGC TCG CTG 
TAC CCA GGC CAT CTT TCA GGA CAT CGA ATG GCT TGG GAT 
ATG ATG ATG AAT TGG TCC CCC GCT GTG GGT ATG GTG GTG 
GCG CAC ATC CTG CGA TTG CCC CAG ACC TTG TTT GAC ATA 
CTG GCC GGG GCC CAT TGG GGC ATC TTG GCG GGC CTA GCC 
TAT TAT TCT ATG CAG GGC AAC TGG GCC AAG GTC GCT ATC 
ATC ATG ATT ATG TTT TCA GGG GTC GAT GCC 


312 

351 

390 

429 

468 

507 

546 

576 


(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


15 


20 


25 


30 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: Z4 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

GAG CAC TAC CGG AAT GCT TCG GGC ATC TAT CAC ATC ACC 

AAT GAT TGT CCG AAT TCC AGT ATA GTC TAT GAA GCT GAC 

CAT CAC ATC CTA CAC TTG CCG GGG TGC GTA CCC TGT GTG 

ATG ACT GGG AAC ACA TCG CGT TGC TGG ACG CCG GTG ACG 

CCT ACA GTG GCT GTC GCA CAC CCG GGC GCT CCG CTT GAG 

TCG TTC CGG CGA CAT GTG GAC TTA ATG GTA GGC GCG GCC 

ACT TTG TGT TCT GCC CTC TAT GTT GGG GAC CTC TGC GGA 

GGT GCC TTC CTG ATG GGG CAG ATG ATC ACT TTT CGG CCG 

CGT CGC CAC TGG ACC ACG CAG GAG TGC AAT TGT TCC ATC 

TAC ACT GGC CAT ATC ACC GGC CAC AGG ATG GCG TGG GAC 

ATG ATG ATG AAC TGG AGC CCT ACC ACC ACT CTG CTC CTC 

GCC CAG ATC ATG AGG GTC CCC ACA GCC TTT CTC GAC ATG 

GTT GCC GGA GGC CAC TGG GGC GTC CTC GCG GGC TTG GCG 

TAC TTC AGC ATG CAA GGC AAT TGG GCC AAG GTA GTC CTG 

GTC CTT TTC CTC TTT GCT GGG GTA GAC GCC 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


39 

78 

117 

156 

195 

234 

273 

312 

351 

390 

429 

468 

507 

546 

576 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: Z1 


35 
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15 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

GTG CAC TAC CGG AAT GCT TCG GGC GTC TAT CAT GTC ACC 
AAT GAT TGC CCT AAC ACC AGC ATA GTG TAC GAG ACG GAG 
CAC CAC ATC ATG CAC TTG CCA GGG TGT GTC CCC TGT GTG 
CGG ACG GAG AAT ACT TCT CGC TGC TGG GTG CCC TTG ACC 
CCC ACT GTG GCC GCG CCC TAT CCC AAC GCA CCG TTA GAG 
TCC ATG CGC AGG CAT GTA GAC CTG ATG GTG GGT GCG GCT 
ACT ATG TGT TCC GCC TTC TAC ATT GGA GAT CTG TGT GGA 
GGC GTC TTC CTA GTG GGC CAG CTG TTC GAC TTC CGA CCG 
CGC CGG CAC TGG ACC ACC CAG GAT TGC AAC TGC TCC ATC 
TAT CCT GGT CAC GTC TCG GGC CAC AGG ATG GCC TGG GAC 
ATG ATG ATG AAC TGG AGC CCT ACC AGC GCG CTG ATT ATG 
GCT CAG ATC TTA CGG ATC CCC TCT ATC CTA GGT GAC TTG 
CTC ACC GGG GGT CAC TGG GGA GTT CTT GCT GGT CTA GCT 
TTC TTC AGC ATG CAG AGT AAC TGG GCG AAG GTC ATC CTG 
GTC CTA TTC CTC TTT GCC GGG GTC GAG GGA 


(2) INFORMATION FOR SEQ ID NO:42: 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


20 


25 


30 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: Z6 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

GTT AAC TAT CGC AAT GCC TCG GGC GTC TAT CAC GTC ACC 

AAC GAC TGC CCG AAC TCG AGC ATA GTG TAT GAG GCC GAA 

CAC CAG ATC TTA CAC CTC CCA GGG TGC TTG CCC TGT GTG 

AGG GTT GGG AAT CAG TCA CGC TGC TGG GTG GCC CTT ACT 

CCC ACC GTG GCG GTG TCT TAT ATC GGT GCT CCG CTT GAC 

TCC CTC CGG AGA CAT GTG GAC CTG ATG GTG GGC GCC GCT 

ACT GTA TGC TCT GCC CTC TAC GTT GGA GAT CTG TGC GGT 

GGT GCA TTC TTG GTT GGC CAG ATG TTC TCC TTC CAG CCG 

CGA CGC CAC TGG ACT ACG CAG GAC TGC AAT TGT TCT ATC 

TAC GCA GGG CAT ATC ACG GGC CAC AGG ATG GCA TGG GAC 

ATG ATG ATG AAC TGG AGT CCC ACA ACC ACC CTG CTT CTC 

GCC CAG GTC ATG AGG ATC CCT AGC ACT CTG GTA GAT CTA 

CTC GCT GGA GGG CAC TGG GGC GTC CTT GTT GGG TTG GCG 

TAC TTC AGT ATG CAA GCT AAT TGG GCC AAA GTC ATC CTG 

GTC CTT TTC CTC TTC GCT GGA GTT GAT GCC 


(2) INFORMATION FOR SEQ ID NO:43: 


35 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: Z7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 


GTC AAC TAT CAC AAT GCC TCG GGC GTC TAT CAC ATC ACC 39 

AAC GAC TGC CCG AAC TCG AGC ATA ATG TAT GAG GCC GAA 78 

in CAC CAC ATC CTA CAC CTC CCA GGG TGC GTA CCC TGT GTG 117 

AGG GAG GGG AAC CAG TCA CGC TGC TGG GTG GCC CTT ACT 156 

CCC ACC GTG GCG GCG CCT TAT ATC GGT GCA CCG CTT GAA 195 

TCC ATC CGG AGA CAT GTG GAC CTG ATG GTA GGC GCT GCT 234 

ACA GTG TGC TCC GCT CTC TAC ATT GGG GAC CTG TGC GGT 273 

GGC GTA TTT TTG GTT GGT CAG ATG TTT TCT TTC CAG CCG 312 

CGA CGC CAC TGG ACT ACG CAG GAC TGC AAT TGT TCC ATC 351 

TAT GCG GGG CAC GTT ACA GGC CAC AGA ATG GCA TGG GAC 390 

15 ATG ATG ATG AAC TGG AGT CCC ACA ACC ACC TTG GTC CTC 429 

GCC CAG GTT ATG AGG ATC CCT AGC ACT CTG GTG GAC CTA 468 

CTC ACT GGA GGG CAC TGG GGT ATC CTT ATC GGG GTG GCA 507 

TAC TTC TGC ATG CAA GCT AAT TGG GCC AAG GTC ATT CTG 546 

GTC CTT TTC CTC TAC GCT GGA GTT GAT GCC 576 


(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


25 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK13 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 

TAC AAC TAT CGC AAC AGC TCG GGT GTC TAC CAT GTC ACC 39 

AAC GAT TGC CCG AAC TCG AGC ATA GTC TAT GAA ACC GAT 78 

30 TAC CAC ATC TTA CAC CTC CCG GGA TGC GTT CCT TGC GTG 117 

AGG GAA GGG AAC AAG TCT ACA TGC TGG GTG TCT CTC ACC 156 

CCC ACC GTG GCT GCG CAA CAT CTG AAT GCT CCG CTT GAG 195 

TCT TTG AGA CGT CAC GTG GAT CTG ATG GTG GGC GGC GCC 234 

ACT CTC TGC TCC GCC CTC TAC ATC GGA GAC GTG TGT GGG 273 

GGT GTG TTC TTG GTC GGT CAA CTG TTC ACC TTC CAA CCT 312 

CGC CGC CAC TGG ACC ACC CAA GAC TGC AAT TGT TCC ATC 351 

35 
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TAC ACA GGA CAT ATC ACA GGA CAC AGA ATG GCT TGG GAC 
ATG ATG ATG AAT TGG AGC CCC ACT GCG ACG CTG GTC CTC 
GCC CAA CTT ATG AGG ATC CCA GGC GCC ATG GTC GAC CTG 
CTT GCA GGC GGC CAC TGG GGC ATT CTG GTT GGC ATA GCG 
TAC TTC AGC ATG CAA GCT AAT TGG GCC AAG GTT ATC CTG 
GTC CTG TTT CTC TTT GCT GGA GTC GAC GCT 


(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA1 


15 


20 


25 


30 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

GTT CCC TAC CGG AAT GCC TCT GGG GTT TAC CAT GTC ACC 

AAT GAC TGC CCA AAC TCC TCC ATA GTC TAC GAG GCT GAT 

AGC CTG ATC TTG CAC GCA CCT GGC TGC GTG CCC TGT GTC 

AGG CAA GAT AAT GTC AGT AGG TGC TGG GTC CAA ATC ACC 

CCC ACA CTG TCA GCC CCG ACC TTC GGA GCG GTC ACG GCT 

CCT CTT CGG AGG GCC GTT GAC TAC TTA GCG GGA GGA GCT 

GCT CTC TGC TCC GCA CTA TAC GTC GGC GAC GCG TGC GGG 

GCA GTG TTT CTG GTA GGC CAA ATG TTC ACC TAT AGG CCT 

CGC CAG CAT ACC ACA GTG CAG GAC TGC AAC TGT TCC ATT 

TAC AGT GGC CAT ATC ACC GGC CAC CGG ATG GCT TGG GAC 

ATG ATG ATG AAT TGG TCA CCT ACG ACA GCC TTG CTG ATG 

GCC CAG ATG CTA CGG ATC CCC CAG GTG GTC ATA GAC ATC 

ATA GCC GGG GGC CAC TGG GGG GTC TTG TTT GCC GCC GCA 

TAC TTT GCG TCG GCC GCC AAC TGG GCT AAG GTA GTG CTG 

GTT CTG TTC CTG TTT GCG GGG GTC GAT GGC 

(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA4 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 
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GTT CCC TAC CGA AAC GCC TCT GGG GTT TAT CAT GTC ACC 
AAT GAT TGC CCA AAC TCT TCC ATA GTT TAC GAG GCT GAT 
AAC CTG ATC TTG CAT GCA CCT GGT TGC GTG CCT TGT GTC 
AGG CAA GAT AAT GTC AGT AAG TGC TGG GTC CAA ATC ACC 
CCC ACG TTG TCA GCC CCG AAT CTC GGA GCG GTC ACG GCT 
CCT CTT CGG AGG GCC GTT GAC TAC TTA GCG GGA GGG GCT 
GCC CTC TGC TCC GCA CTA TAC GTC GGG GAC GCG TGC GGG 
GCA GTG TTT TTG GTA GGC CAA ATG TTC ACC TAT AGG CCT 
CGC CAG CAC ACT ACG GTG CAA GAC TGC AAT TGC TCT ATT 
TAC AGT GGC CAT ATC ACC GGC CAC CGG ATG GCA TGG GAC 
ATG ATG ATG AAT TGG TCA CCT ACG ACG GCC TTG CTG ATG 
GCC CAG TTG CTA CGG ATT CCC CAG GTG GTC ATC GAC ATC 
ATT GCC GGG GGC CAC TGG GGG GTC TTG TTT GCC GCC GCA 
TAT TTC GCG TCA GCG GCT AAC TGG GCT AAG GTT ATA CTG 
GTC TTG TTT CTG TTT GCG GGG GTC GAT GCC 

(2) INFORMATION FOR SEQ ID NO: 47: 


15 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


39 

78 

117 

156 

195 

234 

273 

312 

351 

390 

429 

468 

507 

546 

576 


<vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 
(C) INDIVIDUAL ISOLATE: SA5 


20 


25 


30 


35 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 


GTC CCC TAC CGA AAT GCC TCT GGG GTT TAT CAT GTC ACC 39 
AAT GAT TGC CCA AAC TCT TCC ATA GTC TAC GAG GCT GAT 78 
AAC CTG ATT CTG CAC GCA CCT GGT TGC GTG CCC TGT GTC 117 
AAG GAA GGT AAT GTC AGT AGG TGC TGG GTC CAA ATC ACC 156 
CCC ACA TTG TCA GCC CCG AAC CTC GGA GCG GTC ACG GCT 195 
CCT CTT CGG AGG GTC GTT GAC TAC TTA GCG GGA GGG GCT 234 
GCC CTC TGC TCC GCA CTA TAC GTC GGG GAC GCG TGC GGG 273 
GCA GTG TTC TTG GTA GGC CAA ATG TTC ACC TAT AGG CCT 312 
CGC CAG CAT ACT ACG GTG CAG GAC TGC AAC TGT TCC ATT 351 
TAC AGC GGC CAT ATC ACC GGC CAC CGA ATG GCA TGG GAC 390 
ATG ATG ATG AAT TGG TCA CCT ACG ACA GCC TTG GTG ATG 429 
GCC CAG GTG CTA CGG ATT CCC CAA GTG GTC ATT GAC ATC 468 
ATT GCC GGG GGC CAC TGG GGG GTC TTG TTC GCC GTC GCA 507 
TAC TTC GCG TCA GCG GCT AAC TGG GCT AAG GTT GTG CTG 546 
GTC CTG TTT CTG TTT GCG GGG GTC GAT GGC 576 


(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 


372577 J 



109 


(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA6 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 


GTT CCT TAC CGG AAT GCC TCT GGG GTG TAT CAT GTT ACC 39 

AAT GAT TGC CCA AAC TCT TCC ATA GTC TAT GAG GCT GAT 78 

GAC CTG ATC CTA CAC GCA CCT GGC TGC GTG CCC TGT GTC 117 

CGG AAG GAT AAT GTC AGT AGA TGC TGG GTT CAT ATC ACC 156 

CCC ACA CTA TCA GCC CCG AGC CTC GGA GCG GTC ACG GCT 195 

10 CCT CTT CGG AGG GCC GTT GAT TAC TTG GCG GGA GGG GCC 234 

GCC CTG TGC TCC GCG TTA TAC GTC GGA GAC GTG TGC GGG 273 

GCA TTG TTT TTG GTA GGC CAA ATG TTC ACC TAT AGG CCT 312 

CGC CAG CAT GCT ACG GTA CAG GAC TGC AAC TGC TCC ATT 351 

TAC AGT GGC CAT ATC ACT GGC CAC CGG ATG GCA TGG GAC 390 

ATG ATG ATG AAT TGG TCA CCC GCG ACA GCC TTG GTG ATG 429 

GCC CAA ATG CTA CGG ATT CCC CAG GTG GTC ATT GAC ATC 468 

ATT GCC GGG GGC CAC TGG GGG GTC TTG TTC GCC GCT GCA 507 

15 TAC TTC GCG TCG GCG GCT AAC TGG GCT AAG GTT GTG CTG 546 

GTC TTG TTT CTG TTT GCG GGG GTT GAT GCC 576 


(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

2Q (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA7 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 


GTC CCC TAC CGA AAT GCC TCC GGG GTT TAT CAT GTC ACC 39 

AAT GAT TGC CCG AAC TCT TCC ATA GTC TAT GAG GCT GAC 78 

AAC CTG ATC CTG CAC GCA CCT GGT TGC GTG CCC TGT GTC 117 

AGA CAA AAT AAT GTC AGT AGG TGC TGG GTC CAA ATC ACC 156 

CCC ACA TTG TCA GCC CCG AAC CTC GGA GCG GTC ACG GCT 195 

. CCT CTT CGG AGG GCC GTT GAC TAC CTA GCG GGA GGG GCT 234 

. GCC CTC TGC TCC GCG CTA TAC GTC GGG GAC GCG TGC GGG 273 

GCA GTG TTT TTG GTA GGC CAG ATG TTC AGC TAT AGG CCT 312 

CGC CAG CAC ACT ACG GTG CAG GAC TGC AAC TGT TCC ATT 351 

TAC AGT GGC CAT ATC ACC GGC CAC CGA ATG GCA TGG GAC 390 

ATG ATG ATG AAT TGG TCA CCT ACG ACA GCC TTG GTG ATG 429 

GCC CAG TTG CTA CGG ATT CCC CAG GTG GTC ATC GAC ATC 468 

ATT GCC GGG GGC CAC TGG GGG GTC TTG TTC GCC GCC GCA 507 

35 
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TAT TTC GCG TCA GCG GCT AAC TGG GCT AAG GTT GTG CTG 546 

GTC TTG TTT CTG TTT GCG GGG GTC GAT GCC 576 


5 


10 


15 


20 


25 


30 
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(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 576 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA13 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 


GTT CCC TAC CGA AAT GCC TCT GGG GTT TAT CAT GTC ACC 39 
AAT GAT TGC CCA AAC TCT TCC ATC GTC TAC GAG GCT GAT 78 
GAC CTG ATC TTA CAC GCA CCT GGT TGC GTG CCC TGT GTT 117 
AGG CAG GGT AAT GTC AGT AGG TGC TGG GTC CAG ATC ACC 156 
CCC ACA CTG TCA GCC CCG AGC CTC GGA GCG GTC ACG GCT 195 
CCT CTT CGG AGG GCC GTT GAC TAC TTA GCG GGG GGG GCT 234 
GCC CTT TGC TCC GCG TTA TAC GTC GGA GAC GCG TGC GGG 273 
GCA GTG TTT TTG GTA GGT CAA ATG TTC ACC TAT AGC CCT 312 
CGC CGG CAT AAT GTT GTG CAG GAC TGC AAC TGT TCC ATT 351 
TAC AGT GGC CAC ATC ACC GGC CAC CGG ATG GCA TGG GAC 390 
ATG ATG ATG AAT TGG TCA CCT ACA ACA GCT TTG GTG ATG 429 
GCC CAG TTG TTA CGG ATT CCC CAG GTG GTC ATT GAC ATC 468 
ATT GCC GGG GCC CAC TGG GGG GTC TTG TTC GCC GCC GCA 507 
TAC TAC GCG TCG GCG GCT AAC TGG GCC AAG GTT GTG CTG 546 
GTC CTG TTT CTG TTT GCG GGG GTC GAT GCC 576 


(2) INFORMATION FOR 

' SEQ ID NO: 51: 

(i) 

SEQUENCE CHARACTERISTICS: 


(A) 

LENGTH: 576 base pairs 


(B) 

TYPE: nucleic acid 


(C) 

STRANDEDNESS: single 


(D) 

TOPOLOGY: linear 

(vi) 

ORIGINAL SOURCE: 


(A) 

ORGANISM: homosapiens 


(C) INDIVIDUAL ISOLATE: HK2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 


CTT ACC TAC GGC AAC TCC AGT GGG CTA TAC CAT CTC ACA 39 
AAT GAT TGC CCC AAC TCC AGC ATC GTG CTG GAG GCG GAT 78 
GCT ATG ATC TTG CAT TTG CCT GGA TGC TTG CCT TGT GTG 117 
AGG GTC GAT GAT CGG TCC ACC TGT TGG CAT GCT GTG ACC 156 
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Ill 


CCC 

ACC 

CTG 

GCC 

ATA 

CCA 

AAT 

GGA 

TTC 

CGC 

AGG 

CAT 

GTG 

GAT 

GTG 

GTT 

TGC 

TCA 

TCC 

CTG 

TAC 

TCT 

CTC 

TTT 

TTG 

GCG 

GGA 

CAA 

CGC 

CGT 

CAT 

TGG 

ACT 

GTG 

CAA 

TAT 

ACA 

GGC 

CAC 

GTC 

ACC 

GGC 

ATG 

ATG 

ATG 

AAC 

TGG 

TCA 

CCC 

TCT 

AGC 

ATC 

TTG 

AGG 

GTA 

CCT 

ATA 

TTT 

GGT 

GGC 

CAT 

TGG 

GGG 

TAC 

TTT 

GGC 

ATG 

GCT 

GGC 

AAC 

GTT 

CTG 

TTC 

CTA 

TTT 

GCA 

GGG 


GCT TCC ACG CCC GCA ACG 195 
CTT CTT GCG GGC GCC GCA 234 
ATC GGG GAC CTG TGT GGC 273 
CTA TTC ACC TTT CAG CCC 312 
GAC TGC AAC TGC TCC ATC 351 
CAC AGG ATG GCT TGG GAC 390 
ACA ACC ACT CTG GTC CTA 429 
GAG ATT TGT GCG AGT GTG 468 
ATA CTA CTA GCC GTT GCC 507 
TGG CTA AAA GTT CTG GCT 546 
GTT GAA GCA 576 


(2) INFORMATION FOR SEQ ID NO: 52: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

15 (A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK7 


20 


25 


30 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: 


Tyr 

Gin 

Val 

Arg 

Asn 

5 

Ser 

Thr 

Gly 

Leu 

Tyr 

10 

His 

Val 

Thr 

Asn 

Asp 

15 

Cys 

Pro 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Ala 

25 

Ala 

Asp 

Ala 

lie 

Leu 

30 

His 

Thr 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Arg 

40 

Glu 

Gly 

Asn 

Val 

Ser 

45 

Arg 

Cys 

Trp 

Val 

Ala 

50 

Met 

Thr 

Pro 

Thr 

Val 

55 

Ala 

Thr 

Arg 

Asp 

Gly 

60 

Lys 

Leu 

Pro 

Thr 

Ala 

65 

Gin 

Leu 

Arg 

Arg 

His 

70 

lie 

Asp 

Leu 

Leu 

Val 

75 

Gly 

Ser 

Ala 

Thr 

Leu 

80 

Cys 

Ser 

Ala 

Leu 

Tyr 

85 

Val 

Gly 

Asp 

Leu 

Cys 

90 

Gly 

Ser 

Val 

Phe 

Leu 

95 

Val 

Gly 

Gin 

Leu 

Phe 

100 

Thr 

Phe 

Ser 

Pro 

Arg 

105 

Arg 

His 

Trp 

Thr 

Thr 

110 

Gin 

Gly 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Pro 

Gly 

120 

His 

He 

Thr 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 

Ser 

Pro 

Thr 

Thr 

Ala 

140 

Leu 

Val 

Val 

Ala 

Gin 

145 

Leu 

Leu 

Arg 

lie 

Pro 

150 

Gin 

Ala 

lie 

Leu 

Asp 

155 

Met 

lie 

Ala 

Gly 

Ala 

160 

His 

Trp 

Gly 

Val 

Leu 

165 

Ala 

Gly 

lie 

Ala 

Tyr 

170 

Phe 

Ser 

Met 

Val 

Gly 

175 

Asn 

Trp 

Ala 

Lys 

Val 

180 


35 
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Leu Val Val Leu Leu Leu Phe Ala Gly Val Asp Ala 
185 190 
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(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY : unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK9 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: 


Tyr 

Gin 

Val 

Arg 

Asn 

5 

Ser 

Ser 

Gly 

Leu 

Tyr 

10 

His 

Val 

Thr 

Asn 

Asp 

15 

Cys 

Pro 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Ala 

25 

Ala 

Asp 

Ala 

lie 

Leu 

30 

His 

Ser 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Arg 

40 

Glu 

Gly 

Asn 

Ala 

Ser 

45 

Lys 

Cys 

Trp 

Val 

Ala 

50 

Val 

Ala 

Pro 

Thr 

Val 

55 

Ala 

Thr 

Arg 

Asp 

Gly 

60 

Lys 

Leu 

Pro 

Ala 

Thr 

65 

Gin 

Leu 

Arg 

Arg 

His 

70 

lie 

Asp 

Leu 

Leu 

Val 

75 

Gly 

Ser 

Ala 

Thr 

Leu 

80 

Cys 

Ser 

Ala 

Leu 

Tyr 

85 

Val 

Gly Asp 

Leu 

Cys 

90 

Gly 

Ser 

Val 

Phe 

Leu 

95 

Val 

Gly 

Gin 

Leu 

Phe 

100 

Thr 

Phe 

Ser 

Pro 

Arg 

105 

Arg 

His 

Trp 

Thr 

Thr 

110 

Gin 

Asp 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Pro 

Gly 

120 

His 

He 

Thr 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 

Ser 

Pro 

Thr 

Ala 

Ala 

140 

Leu 

Val 

Met 

Ala 

Gin 

145 

Leu 

Leu 

Arg 

lie 

Pro 

150 

Gin 

Ala 

lie 

Leu 

Asp 

155 

Met 

lie 

Ala 

Gly 

Ala 

160 

His 

Trp 

Gly 

Val 

Leu 

165 

Ala 

Val 

Gly 

Val 

lie 

Val 

Ala 

Leu 

Tyr 

170 

Leu 

185 

Phe 

Leu 

Ser 

Phe 

Met 

Thr 

Val 

Gly 

Gly Asn 
175 

Val Asp 
190 

Trp 

Ala 

Ala 

Lys 

Val 

180 


{2) INFORMATION FOR SEQ ID NO : 54 : 

(i) SEQUENCE CHARACTERISTICS: 


(A) 

LENGTH : 

: 192 

amino acids 

(B) 

TYPE: 

amino 

acid 

(C) 

STRANDEDNESS : 

: unknown 
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(D) 

TOPOLOGY 

: unknown 







(vi 

) 

ORIGINAL 

SOURCE : 











(A) 

ORGANISM 

!: homosapiens 








(C) 

INDIVIDUAL ISOLATE: 

DR1 





(xi) 

SEQUENCE 

DESCRIPTION: 

SEC 

> ID 

NO: 54 : 



His 

Gin 

Val 

Arg 

Asn 

Ser 

Thr 

Gly 

Leu 

Tyr 

His 

Val 

Thr 

Asn 

Asp 




5 





10 





15 

Cys 

Pro 

Asn 

Ser 

Ser 

lie 

Val 

Tyr 

Glu 

Ala 

Ala 

Asp 

Ala 

lie 

Leu 




20 





25 





30 

His 

Ala 

Pro 

Gly 

Cys 

Val 

Pro 

Cys 

Val 

Arg 

Glu 

Gly 

Asn 

Ala 

Ser 




35 





40 





45 

Arg 

Cys 

Trp 

Val 

Ala 

Val 

Thr 

Pro 

Thr 

Val 

Ala 

Thr 

Arg 

Asp 

Gly 


50 





55 





60 

Lys 

Leu 

Pro 

Thr 

Thr 

Gin 

Leu 

Arg 

Arg 

His 

lie 

Asp 

Leu 

Leu 

Val 




65 





70 





75 

Gly 

Ser 

Ala 

Thr 

Leu 

Cys 

Ser 

Ala 

Leu 

Tyr 

Val 

Gly 

Asp 

Leu 

Cys 




80 





85 





90 

Gly 

Ser 

Val 

Phe 

Leu 

Val 

Gly 

Gin 

Leu 

Phe 

Thr 

Phe 

Ser 

Pro 

Arg 




95 





100 





105 

Arg 

His 

Trp 

Thr 

Thr 

Gin 

Asp 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Pro 

Gly 



110 





115 





120 

His 

lie 

Thr 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 




125 





130 





135 

Ser 

Pro 

Thr 

Thr 

Ala 

Leu 

Val 

Met 

Ala 

Gin 

Leu 

Leu 

Arg 

lie 

Pro 





140 





145 





150 

Gin 

Ala 

He 

Leu 

Asp 

Met 

lie 

Ala 

Gly Ala 

His 

Trp 

Gly 

Val 

Leu 





155 





160 





165 

Ala 

Gly 

lie 

Ala 

Tyr 

Phe 

Ser 

Met 

Val 

Gly 

Asn 

Trp 

Ala 

Lys 

Val 




170 





175 





180 

Val 

Val 

Val 

Leu 

Leu 

Leu 

Phe 

Ala 

Gly Val 

Asp 

Ala 








185 





190 






(2) 

INFORMATION FOR SEQ ID 

NO: 

55 : 








(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DR4 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 

His Gin Val Arg Asn Ser Thr Gly Leu Tyr His Val Thr Asn Asp 

5 10 15 


35 
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Cys 

Pro 

Asn 

Ser 

Ser 

lie 

Val 

Tyr 

Glu 

Ala 

Ala 

Asp 

Ala 

lie 

Leu 




20 





25 





30 

His 

Thr 

Pro 

Gly 

Cys 

Val 

Pro 

Cys 

Val 

Arg 

Glu 

Gly 

Asn 

Thr 

Ser 




35 





40 





45 

Arg 

Cys 

Trp 

Val 

Ala 

Val 

Thr 

Pro 

Thr 

Val 

Ala 

Thr 

Arg 

Asp 

Gly 


50 





55 





60 

Lys 

Leu 

Pro 

Thr 

Thr 

Gin 

Leu 

Arg 

Arg 

His 

lie 

Asp 

Leu 

Leu 

Val 




65 





70 





75 

Gly 

Ser 

Ala 

Thr 

Leu 

Cys 

Ser 

Ala 

Leu 

Tyr 

Val 

Gly 

Asp 

Leu 

Cys 




80 





85 





90 

Gly 

Ser 

Val 

Phe 

Leu 

Val 

Gly 

Gin 

Leu 

Phe 

Thr 

Phe 

Ser 

Pro 

Arg 




95 





100 





105 

His 

His 

Trp 

Thr 

Thr 

Gin 

Asp 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Pro 

Gly 




110 





115 





120 

His 

lie 

Thr 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 




125 





130 





135 

Ser 

Pro 

Thr 

Thr 

Ala 

Leu 

Val 

Val 

Ala 

Gin 

Leu 

Leu 

Arg 

lie 

Pro 





140 





145 





150 

Gin 

Ala 

He 

Leu 

Asp 

Met 

lie 

Ala 

Gly 

Ala 

His 

Trp 

Gly 

Val 

Leu 





155 





160 





165 

Ala 

Gly 

lie 

Ala 

Tyr 

Phe 

Ser 

Met 

Val 

Gly 

Asn 

Trp 

Ala 

Lys 

Val 




170 





175 





180 

Leu 

Val 

Val 

Leu 

Leu 

Leu 

Phe 

Ala 

Gly 

Val 

Asp 

Ala 





15 185 190 


(2) INFORMATION FOR SEQ ID NO: 56: 

SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S14 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 


Tyr 

Gin 

Val 

Arg 

Asn 

5 

Ser 

Thr 

Gly 

Leu 

Tyr 

10 

His 

Val 

Thr 

Asn 

Asp 

15 

Cys 

Pro 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Thr 

25 

Ala 

Asp 

Ala 

lie 

Leu 

30 

His 

Ala 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Arg 

40 

Glu 

Gly 

Asn 

Thr 

Ser 

45 

Arg 

Cys 

Trp 

Val 

Ala 

50 

Met 

Thr 

Pro 

Thr 

Val 

55 

Ala 

Thr 

Arg 

Asp 

Gly 

60 

Lys 

Leu 

Pro 

Ala 

Thr 

65 

Gin 

Leu 

Arg 

Arg 

Tyr 

70 

lie 

Asp 

Leu 

Leu 

Val 

75 

Gly 

Ser 

Ala 

Thr 

Leu 

80 

Cys 

Ser 

Ala 

Leu 

Tyr 

85 

Val 

Gly 

Asp 

Leu 

Cys 

90 


(i) 


20 
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Gly 

Ser 

Val 

Phe 

Leu 

Val 

Gly 

Gin 

Leu 

Phe 

Thr 

Phe 

Ser 

Pro 

Arg 




95 





100 





105 

Arg 

Leu 

Trp 

Thr 

Thr 

Gin 

Asp 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Pro 

Gly 



110 





115 





120 

His 

lie 

Thr 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 




125 





130 





135 

Ser 

Pro 

Thr 

Thr 

Ala 

Leu 

Val 

Val 

Ala 

Gin 

Leu 

Leu 

Arg 

lie 

Pro 





140 





145 





150 

Gin 

Ala 

lie 

Leu 

Asp 

Met 

He 

Ala 

Gly 

Ala 

His 

Trp 

Gly 

Val 

Leu 





155 





160 





165 

Ala 

Gly 

lie 

Ala 

Tyr 

Phe 

Ser 

Met 

Val 

Gly 

Asn 

Trp 

Ala 

Lys 

Val 




170 





175 





180 

Leu 

Val 

Val 

Leu 

Leu 

Leu 

Phe 

Ala 

Gly 

Val 

Asp 

Ala 
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190 
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(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S18 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 


Tyr 

Gin 

Val 

Arg 

Asn 

5 

Ser 

Thr 

Gly 

Leu 

Tyr 

10 

His 

Val 

Thr 

Asn 

Asp 

15 

Cys 

Pro 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Thr 

25 

Ala 

Asp 

Thr 

lie 

Leu 

30 

His 

Ser 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Arg 

40 

Glu 

Gly 

Asn 

Ala 

Ser 

45 

Arg 

Cys 

Trp 

Val 

Pro 

50 

Val 

Ala 

Pro 

Thr 

Val 

55 

Ala 

Thr 

Arg 

Asp 

Gly 

60 

Lys 

Leu 

Pro 

Ala 

Thr 

65 

Gin 

Leu 

Arg 

Arg 

His 

70 

lie 

Asp 

Leu 

Leu 

Val 

75 

Gly 

Ser 

Ala 

Thr 

Leu 

80 

Cys 

Ser 

Ala 

Leu 

Tyr 

85 

Val 

Gly 

Asp 

Leu 

Cys 

90 

Gly 

Ser 

Val 

Phe 

Leu 

95 

Val 

Ser 

Gin 

Leu 

Phe 

100 

Thr 

lie 

Ser 

Pro 

Arg 

105 

Arg 

His 

Trp 

Thr 

Thr 

110 

Gin 

Asp 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Pro 

Gly 

120 

His 

lie 

Thr 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 

Ser 

Pro 

Thr 

Thr 

Ala 

140 

Leu 

Val 

lie 

Ala 

Gin 

145 

Leu 

Leu 

Arg 

Val 

Pro 

150 

Gin 

Ala 

Val 

Leu 

Asp 

155 

Met 

lie 

Ala 

Gly 

Ala 

160 

His 

Trp 

Gly 

Val 

Leu 

165 


35 
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Ala Gly lie Ala Tyr Phe Ser Met Ala Gly Asn Trp Ala Lys 
170 175 

Leu Leu Val Leu Leu Leu Phe Ala Gly Val Asp Ala 
185 190 
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(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SW1 


<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 


Tyr 

Gin 

Val 

Arg 

Asn 

Ser 

Ser 

Gly 

Leu 

Tyr 

His 

Val 

Thr 

Asn 




5 





10 





Cys 

Pro 

Asn 

Ser 

Ser 

lie 

Val 

Tyr 

Glu 

Thr 

Ala 

Asp 

Ala 

lie 




20 





25 





His 

Ser 

Pro 

Gly 

Cys 

Val 

Pro 

Cys 

Val 

Arg 

Glu 

Asp 

Gly 

Ala 





35 





40 





Lys 

Cys 

Trp 

Val 

Ala 

Val 

Ala 

Pro 

Thr 

Val 

Ala 

Thr 

Arg 

Asp 


50 





55 





Lys 

Leu 

Pro 

Ala 

Thr 

Gin 

Leu 

Arg 

Arg 

His 

lie 

Asp 

Leu 

Leu 




65 





70 





Gly 

Ser 

Ala 

Thr 

Leu 

Cys 

Ser 

Ala 

Leu 

Tyr 

Val 

Gly 

Asp 

Leu 




80 





85 





Gly 

Ser 

Val 

Phe 

Leu 

Val 

Ser 

Gin 

Leu 

Phe 

Thr 

Phe 

Ser 

Pro 




95 





100 





Arg 

His 

Trp 

Thr 

Thr 

Gin 

Asp 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Pro 




110 





115 





His 

lie 

Thr 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 





125 





130 





Ser 

Pro 

Thr 

Thr 

Ala 

Leu 

Val 

Val 

Ala 

Gin 

Leu 

Leu 

Arg 

lie 





140 





145 





Gin 

Ala 

Val 

Leu 

Asp 

Met 

lie 

Ala 

Gly Ala 

His 

Trp 

Gly 

Val 





155 





160 





Ala 

Gly 

lie 

Ala 

Tyr 

Phe 

Ser 

Met 

Val 

Gly Asn 

Trp 

Ala 

Lys 





170 





175 





Leu 

He 

Val 

Leu 

Leu 

Leu 

Phe 

Ser 

Gly 

Val 

Asp 

Ala 
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(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

35 

372577_1 


Val 

180 


Asp 

15 

Leu 

30 

Pro 

45 

Gly 

60 

Val 

75 

Cys 

90 

Arg 

105 

Gly 

120 

Trp 

135 

Pro 

150 

Leu 

165 

Val 

180 



(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 
(C) INDIVIDUAL ISOLATE: US11 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 


Tyr 

Gin 

Val 

Arg 

Asn 

Ser 

Thr 

Gly 

Leu 

Tyr 

His 

Val 

Thr 

Asn 

Asp 



5 





10 





15 

Cys 

Pro 

Asn 

Ser 

Ser 

lie 

Val 

Tyr 

Glu 

Ala 

Ala 

Asp 

Ala 

lie 

Leu 




20 





25 





30 

His 

Thr 

Pro 

Gly 

Cys 

Val 

Pro 

Cys 

Val 

Arg 

Glu 

Gly 

Asn 

Ala 

Ser 




35 





40 





45 

Arg 

Cys 

Trp 

Val 

Ala 

Met 

Thr 

Pro 

Thr 

Val 

Ala 

Thr 

Arg 

Asp 

Gly 


50 





55 





60 

Lys 

Leu 

Pro 

Thr 

Thr 

Gin 

Leu 

Arg 

Arg 

His 

lie 

Asp 

Leu 

Leu 

Val 




65 





70 





75 

Gly 

Ser 

Ala 

Thr 

Leu 

Cys 

Ser 

Ala 

Leu 

Tyr 

Val 

Gly 

Asp 

Leu 

Cys 




80 





85 





90 

Gly 

Ser 

Val 

Phe 

Leu 

Val 

Gly 

Gin 

Leu 

Phe 

Thr 

Phe 

Ser 

Pro 

Arg 




95 




100 





105 

Arg 

His 

Trp 

Thr 

Thr 

Gin 

Gly 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Pro 

Gly 



110 





115 





120 

His 

lie 

Thr 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 




125 




130 





135 

Ser 

Pro 

Thr 

Ala 

Ala 

Leu 

Val 

Val 

Ala 

Gin 

Leu 

Leu 

Arg 

lie 

Pro 





140 





145 





150 

Gin 

Ala 

He 

Leu 

Asp 

Met 

lie 

Ala 

Gly Ala 

His 

Trp 

Gly Val 

Leu 





155 





160 





165 

Ala 

Gly 

lie 

Ala 

Tyr 

Phe 

Ser 

Met 

Val 

Gly 

Asn 

Trp 

Ala 

Lys 

Val 




170 





175 





180 

Leu 

Val 

Val 

Leu 

Leu 

Leu 

Phe 

Ala 

Gly Val 

Asp 

Ala 
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(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: D1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 



118 


Tyr 

Glu 

Val 

Arg 

Asn 

5 

Val 

Ser 

Gly 

Val 

Tyr 

10 

His 

Val 

Thr 

Asn 

Asp 

15 

Cys 

Ser 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Thr 

25 

Ala 

Asp 

Met 

lie 

Met 

30 

His 

Thr 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Arg 

40 

Glu 

Asp 

Asn 

Ser 

Ser 

45 

Arg 

Cys 

Trp 

Val 

Ala 

50 

Leu 

Thr 

Pro 

Thr 

Leu 

55 

Ala 

Ala 

Arg 

Asn 

Gly 

60 

Asn 

Val 

Pro 

Thr 

Thr 

65 

Ala 

lie 

Arg 

Arg 

His 

70 

Val 

Asp 

Leu 

Leu 

Val 

75 

Gly Ala 

Ala 

Ala 

Phe 

80 

Cys 

Ser 

Ala 

Met 

Tyr 

85 

Val 

Gly Asp 

Leu 

Cys 

90 

Gly 

Ser 

Val 

Phe 

Leu 

95 

He 

Ser 

Gin 

Leu 

Phe 

100 

Thr 

Leu 

Ser 

Pro 

Arg 

105 

Arg 

His 

Glu 

Thr 

Val 

110 

Gin 

Glu 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Pro 

Gly 

120 

His 

Val 

Thr 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 

Ser 

Pro 

Thr 

Thr 

Ala 

140 

Leu 

Val 

Val 

Ser 

Gin 

145 

Leu 

Leu 

Arg 

lie 

Pro 

150 

Gin 

Ala 

Val 

Met 

Asp 

155 

Met 

Val 

Ala 

Gly 

Ala 

160 

His 

Trp 

Gly 

Val 

Leu 

165 

Ala 

Leu 

Gly 

lie 

Leu 

Val 

Ala 

Met 

Tyr 

170 

Leu 

185 

Tyr 

Leu 

Ser 

Phe 

Met 

Ala 

Val 

Gly 

Gly 

175 

Val 

190 

Asn 

Asp 

Trp 

Gly 

Ala 

Lys 

Val 

180 


20 


25 


30 


(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: D3 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 


Tyr 

Glu 

Val 

Arg 

Asn 

5 

Val 

Ser 

Gly 

Val 

Tyr 

10 

Gin 

Val 

Thr 

Asn 

Asp 

15 

Cys 

Ser 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Thr 

25 

Ala 

Asp 

Met 

lie 

Met 

30 

His 

Thr 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Arg 

40 

Glu 

Asp 

Asn 

Ser 

Ser 

45 

Arg 

Cys 

Trp 

Val 

Ala 

50 

Leu 

Thr 

Pro 

Thr 

Leu 

55 

Ala 

Ala 

Arg 

Asn 

Ser 

60 

Ser 

Val 

Pro 

Thr 

Thr 

65 

Thr 

lie 

Arg 

Arg 

His 

70 

Val 

Asp 

Leu 

Leu 

Val 

75 


35 


372577_1 



119 


Gly Ala 

Ala 

Ala 

Phe 

80 

Cys 

Ser 

Ala 

Met 

Tyr 

85 

Val 

Gly Asp 

Leu 

Cys 

90 

Gly 

Ser 

Val 

Phe 

Leu 

95 

Val 

Ser 

Gin 

Leu 

Phe 

100 

Thr 

Phe 

Ser 

Pro 

Arg 

105 

Arg 

His 

Glu 

Thr 

Val 

110 

Gin 

Glu 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Pro 

Gly 

120 

His 

Val 

Thr 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 

Ser 

Pro 

Thr 

Ala 

Ala 

140 

Leu 

Val 

Val 

Ser 

Gin 

145 

Leu 

Leu 

Arg 

lie 

Pro 

150 

Gin 

Ala 

Val 

Val 

Asp 

155 

Met 

Val 

Ala 

Gly 

Ala 

160 

His 

Trp 

Gly 

Val 

Leu 

165 

Ala 

Leu 

Gly 

He 

Leu 

Val 

Ala 

Met 

Tyr 

170 

Leu 

185 

Tyr 

Leu 

Ser 

Phe 

Met 

Ala 

Val 

Gly 

Gly Asn 
175 

Val Asp 
190 

Trp 

Gly 

Ala 

Lys 

Val 

180 


15 


20 


25 


30 


(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

( C ) S TRANDEDNE S S : unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK1 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 


Tyr 

Glu 

Val 

Arg 

Asn 

5 

Val 

Ser 

Gly 

Val 

Tyr 

10 

His 

Val 

Thr 

Asn 

Asp 

15 

Cys 

Ser 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Ala 

25 

Val 

Asp 

Val 

lie 

Met 

30 

His 

Thr 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Arg 

40 

Glu 

Asn 

Asn 

His 

Ser 

45 

Arg 

Cys 

Trp 

Val 

Ala 

50 

Leu 

Thr 

Pro 

Thr 

Leu 

55 

Ala 

Ala 

Arg 

Asn 

Ala 

60 

Ser 

lie 

Pro 

Thr 

Thr 

65 

Thr 

lie 

Arg 

Arg 

His 

70 

Val 

Asp 

Leu 

Leu 

Val 

75 

Gly 

Ala 

Ala 

Ala 

Phe 

80 

Cys 

Ser 

Ala 

Met 

Tyr 

85 

Val 

Gly 

Asp 

Leu 

Cys 

90 

Gly 

Ser 

Val 

Phe 

Leu 

95 

Val 

Ser 

Gin 

Leu 

Phe 

100 

Thr 

Phe 

Ser 

Pro 

Arg 

105 

Arg 

His 

Glu 

Thr 

Ala 

110 

Gin 

Asp 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Pro 

Gly 

120 

His 

Val 

Ser 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 

Ser 

Pro 

Thr 

Thr 

Ala 

140 

Leu 

Val 

Leu 

Ser 

Gin 

145 

Leu 

Leu 

Arg 

lie 

Pro 

150 


35 
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Gin Ala Val Val Asp Met Val Ala Gly Ala His Trp Gly Val Leu 

155 160 165 

Ala Gly Leu Ala Tyr Tyr Ser Met Ala Gly Asn Trp Ala Lys Val 

170 175 180 

Leu lie Val Leu Leu Leu Phe Ala Gly Val Asp Gly 

185 190 


0 (2) INFORMATION FOR SEQ ID NO:63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

10 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: HK3 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 



Tyr 

Glu 

Val 

Arg 

Asn 

Val 

Ser 

Gly 

lie 

Tyr 

His 

Val 

Thr 

Asn 

Asp 

15 




5 





10 





15 


Cys 

Ser 

Asn 

Ser 

Ser 

Val 

Val 

Tyr 

Glu 

Thr 

Ala 

Asp 

Met 

lie 

Met 





20 





25 





30 


His 

Thr 

Pro 

Gly 

Cys 

Val 

Pro 

Cys 

Val 

Arg 

Glu 

Asn 

Asn 

Ser 

Ser 





35 





40 





45 


Arg 

Cys 

Trp 

Val 

Ala 

Leu 

Thr 

Pro 

Thr 

Leu 

Ala 

Ala 

Arg 

Asn 

Val 



50 





55 





60 

20 

Ser 

Val 

Pro 

Thr 

Thr 

Thr 

He 

Arg 

Arg 

His 

Val 

Asp 

Leu 

Leu 

Val 





65 





70 





75 


Gly 

Ala 

Ala 

Ala 

Phe 

Cys 

Ser 

Ala 

Met 

Tyr 

Val 

Gly 

Asp 

Leu 

Cys 





80 




85 





90 


Gly 

Ser 

Val 

Phe 

Leu 

Val 

Ser 

Gin 

Leu 

Phe 

Thr 

Phe 

Ser 

Pro 

Arg 





95 





100 





105 


Arg 

His 

Glu 

Thr 

Val 

Gin 

Asp 

Cys 

Asn 

Cys 

Ser 

Leu 

Tyr 

Pro 

Gly 





110 





115 





120 

25 

His 

Val 

Ser 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 






125 





130 





135 


Ser 

Pro 

Thr 

Ala 

Ala 

Leu 

Val 

Val 

Ser 

Gin 

Leu 

Leu 

Arg 

lie 

Pro 






140 





145 





150 


Gin 

Ala 

Val 

Val 

Asp 

Met 

Val 

Ala 

Gly 

Ala 

His 

Trp 

Gly 

Val 

Leu 






155 





160 





165 


Ala 

Gly 

Leu 

Ala 

Tyr 

Tyr 

Ser 

Met 

Val 

Gly Asn 

Trp 

Ala 

Lys 

Val 

30 




170 





175 





180 

Leu 

lie 

Val 

Met 

Leu 

Leu 

Phe 

Ala 

Gly Val 

Asp 

Gly 









185 





190 







(2) INFORMATION FOR SEQ ID NO:64: 

(i) SEQUENCE CHARACTERISTICS: 

35 
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5 


10 


15 


20 


25 


30 


35 


(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: HK4 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 


His 

Glu 

Val 

His 

Asn 

Val 

Ser 

Gly 

lie 

Tyr 

His 

Val 

Thr 

Asn 

Asp 





5 





10 





15 

Cys 

Ser 

Asn 

Ser 

Ser 

lie 

Val 

Tyr 

Glu 

Ala 

Ala 

Asp 

Met 

lie 

Met 




20 





25 





30 

His 

Thr 

Pro 

Gly 

Cys 

Val 

Pro 

Cys 

Val 

Arg 

Glu 

Asn 

Asn 

Ser 

Ser 




35 





40 





45 

Arg 

Cys 

Trp 

Val 

Ala 

Leu 

Thr 

Pro 

Thr 

Leu 

Ala 

Ala 

Arg 

Asn 

Ala 


50 





55 





60 

Ser 

He 

Pro 

Thr 

Thr 

Thr 

lie 

Arg 

Arg 

His 

Val 

Asp 

Leu 

Leu 

Val 





65 





70 





75 

Gly Ala 

Ala 

Ala 

Phe 

Cys 

Ser 

Ala 

Met 

Tyr 

Val 

Gly 

Asp 

Leu 

Cys 





80 





85 





90 

Gly 

Ser 

Val 

Phe 

Leu 

Val 

Ser 

Gin 

Leu 

Phe 

Thr 

Phe 

Ser 

Pro 

Arg 




95 





100 





105 

Arg 

His 

Glu 

Thr 

Val 

Gin 

Asp 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Pro 

Gly 




110 





115 





120 

His 

Val 

Ser 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 




125 





130 





135 

Ser 

Pro 

Thr 

Ala 

Ala 

Leu 

Val 

Val 

Ser 

Gin 

Leu 

Leu 

Arg 

Leu 

Pro 





140 





145 





150 

Gin 

Ala 

Val 

Met 

Asp 

Met 

Val 

Ala 

Gly 

Ala 

His 

Trp 

Gly 

Val 

Leu 





155 





160 





165 

Ala Gly 

Leu 

Ala 

Tyr 

Tyr 

Ser 

Met 

Val 

Gly 

Asn 

Trp 

Ala 

Lys 

Val 





170 





175 





180 

Leu 

lie 

Val 

Met 

Leu 

Leu 

Phe 

Ala 

Gly 

Val 

Asp 

Gly 








185 





190 







(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: HK5 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 
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5 


10 


15 


20 


25 


30 


35 


Tyr 

Glu 

Val 

Arg 

Asn 

5 

Val 

Ser 

Gly 

Val 

Tyr 

10 

His 

Val 

Thr 

Asn 

Asp 

15 

Cys 

Ser 

Asn 

Leu 

Ser 

20 

He 

Val 

Tyr 

Glu 

Thr 

25 

Thr 

Asp 

Met 

lie 

Met 

30 

His 

Thr 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Arg 

40 

Glu 

Asn 

Asn 

Ser 

Ser 

45 

Arg 

Cys 

Trp 

Val 

Ala 

50 

Leu 

Ala 

Pro 

Thr 

Leu 

55 

Ala 

Ala 

Arg 

Asn 

Ala 

60 

Ser 

Val 

Pro 

Thr 

Thr 

65 

Ala 

lie 

Arg 

Arg 

His 

70 

Val 

Asp 

Leu 

Leu 

Val 

75 

Gly Ala 

Ala 

Ala 

Phe 

80 

Cys 

Ser 

Ala 

Met 

Tyr 

85 

Val 

Gly 

Asp 

Leu 

Cys 

90 

Gly 

Ser 

Val 

Phe 

Leu 

95 

Val 

Ser 

Gin 

Leu 

Phe 

100 

Thr 

Phe 

Ser 

Pro 

Arg 

105 

Arg 

His 

Glu 

Thr 

Val 

110 

Gin 

Asp 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Pro 

Gly 

120 

His 

Val 

Thr 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 

Ser 

Pro 

Thr 

Thr 

Ala 

140 

Leu 

Val 

Val 

Ser 

Gin 

145 

Leu 

Leu 

Arg 

lie 

Pro 

150 

Gin 

Ala 

Val 

Val 

Asp 

155 

Met 

Val 

Ala 

Gly 

Ala 

160 

His 

Trp 

Gly 

Val 

Leu 

165 

Ala 

Leu 

(2) 

Gly Leu Ala Tyr Tyr Ser 
170 

lie Val Met Leu Leu Phe 
185 

INFORMATION FOR SEQ ID 

Met 

Ala 

NO: i 

Val 

Gly 

66 : 

Gly 

175 

Val 

190 

Asn 

Asp 

Trp 

Gly 

Ala 

Lys 

Val 

180 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: HK8 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 


Tyr 

Glu 

Val 

Arg 

Asn 

5 

Val 

Ser 

Gly 

lie 

Tyr 

10 

His 

Val 

Thr 

Asn 

Asp 

15 

Cys 

Ser 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Thr 

25 

Ala 

Asp 

Met 

lie 

Met 

30 

His 

Thr 

Pro 

Gly 

Cys 

35 

Met 

Pro 

Cys 

Val 

Arg 

40 

Glu 

Asn 

Asn 

Ser 

Ser 

45 

Arg 

Cys 

Trp 

Val 

Ala 

50 

Leu 

Thr 

Pro 

Thr 

Leu 

55 

Ala 

Ala 

Arg 

Asn 

Val 

60 

Ser 

Val 

Pro 

Thr 

Thr 

65 

Thr 

lie 

Arg 

Arg 

His 

70 

Val 

Asp 

Leu 

Leu 

Val 

75 
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Gly Ala 

Ala 

Ala 

Phe 

Cys 

Ser 

Ala 

Met 

Tyr 

Val 

Gly Asp 

Leu 

Cys 

Gly 

Ser 

Val 

Phe 

80 

Leu 

Val 

Ser 

Gin 

Leu 

85 

Phe 

Thr 

Phe 

Ser 

Pro 

90 

Arg 

Arg 

His 

Glu 

Thr 

95 

Val 

Gin 

Asp 

Cys 

Asn 

100 

Cys 

Ser 

lie 

Tyr 

Pro 

105 

Gly 

His 

Val 

Ser 

Gly 

110 

His 

Arg 

Met 

Ala 

Trp 

115 

Asp 

Met 

Met 

Met 

Asn 

120 

Trp 

Ser 

Pro 

Thr 

Thr 

125 

Ala 

Leu 

Val 

Val 

Ser 

130 

Gin 

Leu 

Leu 

Arg 

lie 

135 

Pro 

Gin 

Ala 

lie 

Val 

140 

Asp 

Met 

Val 

Ala 

Gly 

145 

Ala 

His 

Trp 

Gly Val 

150 

Leu 

Ala 

Gly 

Leu 

Ala 

155 

Tyr 

Tyr 

Ser 

Met 

Val 

160 

Gly Asn 

Trp 

Ala 

Lys 

165 

Val 

Leu 

He 

Val 

Met 

170 

Leu 

185 

Leu 

Phe 

Ala 

Gly 

175 

Val 

190 

Asp 

Gly 



180 


15 


20 


25 


30 


35 


(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: IND5 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 


Tyr 

Glu 

Val 

Arg 

Asn 

5 

Val 

Ser 

Gly 

Val 

Tyr 

10 

His 

Val 

Thr 

Asn 

Asp 

15 

Cys 

Ser 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Ala 

25 

Ala 

Asp 

Met 

lie 

Met 

30 

His 

Thr 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Arg 

40 

Glu 

Gly 

Asn 

Ser 

Ser 

45 

Arg 

Cys 

Trp 

Val 

Ala 

50 

Leu 

Thr 

Pro 

Thr 

Leu 

55 

Ala 

Ala 

Arg 

Asn 

Ala 

60 

Ser 

Val 

Ser 

Thr 

Thr 

65 

Thr 

lie 

Arg 

His 

His 

70 

Val 

Asp 

Leu 

Leu 

Val 

75 

Gly Ala 

Ala 

Ala 

Phe 

80 

Cys 

Ser 

Ala 

Met 

Tyr 

85 

Val 

Gly 

Asp 

Leu 

Cys 

90 

Gly 

Ser 

Val 

Phe 

Leu 

95 

Val 

Ser 

Gin 

Leu 

Phe 

100 

Thr 

Phe 

Ser 

Pro 

Arg 

105 

Arg 

His 

Glu 

Thr 

Val 

110 

Gin 

Asp 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Pro 

Gly 

120 

His 

Val 

Ser 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 

Ser 

Pro 

Thr 

Ala 

Ala 

140 

Leu 

Val 

Val 

Ser 

Gin 

145 

Leu 

Leu 

Arg 

lie 

Pro 

150 
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Gin 

Ala 

Val 

Val 

Asp 

Met 

Val 

Ala 

Gly 

Ala 

His 

Trp 

Gly 

He 

Leu 





155 





160 





165 

Ala 

Gly 

Leu 

Ala 

Tyr 

Tyr 

Ser 

Met 

Val 

Gly 

Asn 

Trp 

Ala 

Lys 

Val 





170 





175 





180 

Leu 

lie 

Val 

Met 

Leu 

Leu 

Phe 

Ala 

Gly 

Val 

Asp 

Gly 








185 





190 







(2) INFORMATION FOR SEQ ID NO: 68: 

{ i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: IND8 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: 


Tyr 

Glu 

Val 

Arg 

Asn 

5 

Val 

Ser 

Gly 

Val 

Tyr 

10 

His 

Val 

Thr 

Asn 

Asp 

15 

Cys 

Ser 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Ala 

25 

Ala 

Asp 

Met 

lie 

Met 

30 

His 

Thr 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Arg 

40 

Glu 

Gly 

Asn 

Phe 

Ser 

45 

Ser 

Cys 

Trp 

Val 

Ala 

50 

Leu 

Thr 

Pro 

Thr 

Leu 

55 

Ala 

Ala 

Arg 

Asn 

Ala 

60 

Ser 

Val 

Pro 

Thr 

Thr 

65 

Thr 

lie 

Arg 

Arg 

His 

70 

Val 

Asp 

Leu 

Leu 

Val 

75 

Gly 

Ala 

Ala 

Ala 

Phe 

80 

Cys 

Ser 

Ala 

Met 

Tyr 

85 

Val 

Gly 

Asp 

Leu 

Cys 

90 

Gly 

Ser 

Val 

Phe 

Leu 

95 

Val 

Ser 

Gin 

Leu 

Phe 

100 

Thr 

Phe 

Ser 

Pro 

Arg 

105 

Arg 

His 

Glu 

Thr 

Val 

110 

Gin 

Asp 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Pro 

Gly 

120 

His 

Val 

Ser 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 

Ser 

Pro 

Thr 

Ala 

Ala 

140 

Leu 

Val 

Val 

Ser 

Gin 

145 

Leu 

Leu 

Arg 

lie 

Pro 

150 

Gin 

Ala 

Val 

Val 

Asp 

155 

Met 

Val 

Ala 

Gly 

Ala 

160 

His 

Trp 

Gly 

lie 

Leu 

165 

Ala 

Leu 

Gly 

lie 

Leu 

Val 

Ala 

Met 

Tyr 

170 

Leu 

185 

Tyr 

Leu 

Ser 

Phe 

Met 

Ala 

Val 

Gly 

Gly 

175 

Val 

190 

Asn 

Asp 

Trp 

Gly 

Ala 

Lys 

Val 

180 


(2) INFORMATION FOR SEQ ID NO: 69: 


35 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: P10 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 


Tyr 

Glu 

Val 

Arg 

Asn 

Val 

Ser 

Gly 

Val 

Tyr 

His 

Val 

Thr 

Asn 

Asp 




5 





10 





15 

Cys 

Ser 

Asn 

Ser 

Ser 

lie 

Val 

Tyr 

Glu 

Ala 

Ala 

Asp 

Met 

lie 

Met 




20 





25 





30 

His 

Thr 

Pro 

Gly 

Cys 

Val 

Pro 

Cys 

Val 

Arg 

Glu 

Asn 

Asn 

Ser 

Ser 





35 





40 





45 

Arg 

Cys 

Trp 

Val 

Ala 

Leu 

Thr 

Pro 

Thr 

Leu 

Ala 

Ala 

Arg 

Asn 

Ser 



50 





55 





60 

Ser 

Val 

Pro 

Thr 

Thr 

Ala 

lie 

Arg 

Arg 

His 

Val 

Asp 

Leu 

Leu 

Val 





65 





70 





75 

Gly Ala 

Ala 

Ala 

Phe 

Cys 

Ser 

Ala 

Met 

Tyr 

Val 

Gly Asp 

Leu 

Cys 





80 





85 





90 

Gly 

Ser 

Val 

Leu 

Leu 

Val 

Ser 

Gin 

Leu 

Phe 

Thr 

Phe 

Ser 

Pro 

Arg 




95 





100 





105 

Arg 

His 

Trp 

Thr 

Val 

Gin 

Asp 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Pro 

Gly 





110 





115 





120 

His 

Val 

Ser 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 





125 





130 





135 

Ser 

Pro 

Thr 

Ala 

Ala 

Leu 

Val 

Val 

Ser 

Gin 

Leu 

Leu 

Arg 

lie 

Pro 





140 





145 





150 

Gin 

Ala 

lie 

Leu 

Asp 

Val 

Val 

Ala 

Gly 

Ala 

His 

Trp 

Gly 

Val 

Leu 





155 





160 





165 

Ala 

Gly 

Leu 

Ala 

Tyr 

Tyr 

Ser 

Met 

Val 

Gly 

Asn 

Trp 

Ala 

Lys 

Val 




170 





175 





180 

Leu 

He 

Val 

Met 

Leu 

Leu 

Phe 

Ala 

Gly 

Val 

Asp 

Gly 
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(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S9 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:70: 


Tyr 

Glu 

Val 

Arg 

Asn 

5 

Val 

Ser 

Gly 

Ala 

Tyr 

10 

His 

Val 

Thr 

Asn 

Asp 

15 

Cys 

Ser 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Ala 

25 

Ala 

Asp 

Val 

lie 

Met 

30 

His 

Thr 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Gin 

40 

Glu 

Gly Asn 

Ser 

Ser 

45 

Gin 

Cys 

Trp 

Val 

Ala 

50 

Leu 

Thr 

Pro 

Thr 

Leu 

55 

Ala 

Ala 

Arg 

Asn 

Ala 

60 

Thr 

Val 

Pro 

Thr 

Thr 

65 

Thr 

lie 

Arg 

Arg 

His 

70 

Val 

Asp 

Leu 

Leu 

Val 

75 

Gly 

Ala 

Ala 

Val 

Phe 

80 

Cys 

Ser 

Ala 

Met 

Tyr 

85 

Val 

Gly Asp 

Leu 

Cys 

90 

Gly 

Ser 

Val 

Phe 

Leu 

95 

lie 

Ser 

Gin 

Leu 

Phe 

100 

Thr 

lie 

Ser 

Pro 

Arg 

105 

Arg 

His 

Glu 

Thr 

Val 

110 

Gin 

Asn 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Pro 

Gly 

120 

His 

Val 

Thr 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 

Ser 

Pro 

Thr 

Thr 

Ala 

140 

Leu 

Val 

Val 

Ser 

Gin 

145 

Leu 

Leu 

Arg 

lie 

Pro 

150 

Gin 

Ala 

Val 

Met 

Asp 

155 

Met 

Val 

Ala 

Gly 

Ala 

160 

His 

Trp 

Gly 

Val 

Leu 

165 

Ala 

Leu 

Gly 

He 

Leu 

Val 

Ala 

Met 

Tyr 

170 

Leu 

185 

Tyr 

Leu 

Ser 

Phe 

Met 

Ala 

Val 

Gly 

Gly 

175 

Val 

190 

Asn 

Asp 

Trp 

Gly 

Ala 

Lys 

Val 

180 


(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

25 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S45 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 


30 

Tyr 

Glu 

Val 

Arg 

Asn 

5 

Val 

Ser 

Gly 

Ala 

Tyr 

10 

His 

Val 

Thr 

Asn 

Asp 

15 


Cys 

Ser 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Ala 

25 

Val 

Asp 

Val 

lie 

Leu 

30 


His 

Thr 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Arg 

40 

Glu 

Asn 

Asn 

Ser 

Ser 

45 


Arg 

Cys 

Trp 

Val 

Ala 

50 

Leu 

Thr 

Pro 

Thr 

Leu 

55 

Ala 

Ala 

Arg 

Asn 

Ser 

60 
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Ser 

Val 

Pro 

Thr 

Thr 

Thr 

lie 

Arg 

Arg 

His 

Val 

Asp 

Leu 

Leu 

Val 





65 





70 





75 

Gly 

Ala 

Ala 

Ala 

Phe 

Cys 

Ser 

Ala 

Met 

Tyr 

Val 

Gly 

Asp 

Leu 

Cys 





80 





85 





90 

Gly 

Ser 

Val 

Phe 

Leu 

Val 

Ser 

Gin 

Leu 

Phe 

Thr 

Phe 

Ser 

Pro 

Arg 





95 





100 





105 

Arg 

His 

Glu 

Thr 

Val 

Gin 

Asp 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Pro 

Gly 





110 





115 





120 

His 

Val 

Thr 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 





125 





130 





135 

Ser 

Pro 

Thr 

Ala 

Ala 

Leu 

Val 

Val 

Ser 

Gin 

Leu 

Leu 

Arg 

lie 

Pro 





14 0 





145 





150 

Gin 

Ala 

Val 

Val 

Asp 

Met 

Val 

Ala 

Gly 

Ala 

His 

Trp 

Gly Val 

Leu 





155 





160 





165 

Ala 

Gly 

Leu 

Ala 

Tyr 

Tyr 

Ser 

Met 

Val 

Gly Asn 

Trp 

Ala 

Lys 

Val 





170 





175 





180 

Leu 

He 

Val 

Met 

Leu 

Leu 

Phe 

Ala 

Gly 

Val 

Asp 

Gly 








185 





190 







(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA10 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: 


Tyr 

Glu 

Val 

Arg 

Asn 

5 

Val 

Ser 

Gly 

Met 

Tyr 

10 

His 

Val 

Thr 

Asn 

Asp 

15 

Cys 

Ser 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Ala 

25 

Ala 

Asp 

Met 

lie 

Met 

30 

His 

Thr 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Arg 

40 

Glu 

Asn 

Asn 

Ser 

Ser 

45 

Arg 

Cys 

Trp 

Val 

Ala 

50 

Leu 

Thr 

Pro 

Thr 

Leu 

55 

Ala 

Ala 

Arg 

Asn 

Ser 

60 

Ser 

Val 

Pro 

Thr 

Thr 

65 

Thr 

lie 

Arg 

Arg 

His 

70 

Val 

Asp 

Leu 

Leu 

Val 

75 

Gly 

Ala 

Ala 

Ala 

Phe 

80 

Cys 

Ser 

Ala 

Met 

Tyr 

85 

Val 

Gly 

Asp 

Leu 

Cys 

90 

Gly 

Ser 

Val 

Phe 

Leu 

95 

Val 

Ser 

Gin 

Leu 

Phe 

100 

Thr 

Phe 

Ser 

Pro 

Arg 

105 

Arg 

Tyr 

Glu 

Thr 

Val 

110 

Gin 

Asp 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Pro 

Gly 

120 

Arg 

Val 

Thr 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 
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Ser 

Pro 

Thr 

Thr 

Ala 

Leu 

Val 

Val 

Ser 

Gin 

Leu 

3 

Q) 

Arg 

lie 

Pro 





140 





145 





150 

Gin 

Ala 

lie 

Val 

Asp 

Met 

Val 

Ala 

Gly 

Ala 

His 

Trp 

Gly 

Val 

Leu 





155 





160 





165 

Ala 

Gly 

Leu 

Ala 

Tyr 

Tyr 

Ser 

Met 

Val 

Gly 

Asn 

Trp 

Ala 

Lys 

Val 





170 





175 





180 

Leu 

He 

Val 

Met 

Leu 

Leu 

Phe 

Ala 

Gly 

Val 

Asp 

Gly 
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(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SW2 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73: 


Tyr 

Glu 

Val 

Arg 

Asn 

Val 

Ser 

Gly 

Val 

Tyr 

His 

Val 

Thr 

Asn 

Asp 





5 





10 





15 

Cys 

Ser 

Asn 

Ser 

Ser 

lie 

Val 

Tyr 

Glu 

Thr 

Ala 

Asp 

Met 

lie 

Met 





20 





25 





30 

His 

Thr 

Pro 

Gly 

Cys 

Val 

Pro 

Cys 

Val 

Arg 

Glu 

Ala 

Asn 

Ser 

Ser 





35 





40 





45 

Arg 

Cys 

Trp 

Val 

Ala 

Leu 

Thr 

Pro 

Thr 

Leu 

Ala 

Ala 

Arg 

Asn 

Thr 





50 





55 





60 

Ser 

Val 

Pro 

Thr 

Thr 

Thr 

lie 

Arg 

Arg 

His 

Val 

Asp 

Leu 

Leu 

Val 





65 





70 





75 

Gly 

Ala 

Ala 

Ala 

Phe 

Cys 

Ser 

Val 

Met 

Tyr 

Val 

Gly 

Asp 

Leu 

Cys 





80 





85 





90 

Gly 

Ser 

Val 

Phe 

Leu 

Val 

Ser 

Gin 

Leu 

Phe 

Thr 

Phe 

Ser 

Pro 

Arg 





95 





100 





105 

Arg 

His 

Glu 

Thr 

Val 

Gin 

Asp 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Pro 

Gly 





110 





115 





120 

His 

Val 

Ser 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 





125 





130 





135 

Ser 

Pro 

Thr 

Ala 

Ala 

Leu 

Val 

Val 

Ser 

Gin 

Leu 

Leu 

Arg 

lie 

Pro 





140 





145 





150 

Gin 

Ala 

Val 

Val 

Asp 

Met 

Val 

Ala 

Gly Ala 

His 

Trp 

Gly 

Val 

Leu 





155 





160 





165 

Ala 

Gly 

Leu 

Ala 

Tyr 

Tyr 

Ser 

Met 

Val 

Gly 

Asn 

Trp 

Ala 

Lys 

Val 





170 





175 





180 

Leu 

lie 

Val 

Met 

Leu 

Leu 

Phe 

Ala 

Gly 

Val 

Asp 

Gly 








185 





190 







(2) INFORMATION FOR SEQ ID NO : 74 : 

35 


372S77J 



129 


5 


10 


15 


20 


25 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 


Tyr 

Glu 

Val 

Arg 

Asn 

Val 

Ser 

Cys 

Ser 

Asn 

Ser 

Ser 

20 

lie 

Val 

His 

Thr 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Arg 

Cys 

Trp 

Val 

Ala 

50 

Leu 

Thr 

Ser 

Val 

Pro 

Thr 

Lys 

65 

Thr 

lie 

Gly 

Ala 

Ala 

Ala 

Phe 

80 

Cys 

Ser 

Gly 

Ser 

Val 

Phe 

Leu 

95 

Val 

Ser 

Arg 

His 

Glu 

Thr 

Val 

110 

Gin 

Asp 

His 

Val 

Thr 

Gly 

His 

125 

Arg 

Met 

Ser 

Pro 

Thr 

Thr 

Ala 

140 

Leu 

Val 

Gin 

Ala 

Val 

Val 

Asp 

155 

Met 

Val 

Ala 

Gly 

Leu 

Ala 

Tyr 

170 

Tyr 

Ser 

Leu 

He 

Val 

Leu 

Leu 

185 

Leu 

Phe 


Gly 

Val 

Tyr 

10 

Tyr 

Val 

Thr 

Asn 

Asp 

15 

Tyr 

Glu 

Thr 

25 

Ala 

Asp 

Met 

lie 

Met 

30 

Cys 

Val 

Arg 

40 

Glu 

Ser 

Asn 

Ser 

Ser 

45 

Pro 

Thr 

Leu 

55 

Ala 

Ala 

Arg 

Asn 

Ala 

60 

Arg 

Arg 

His 

70 

Val 

Asp 

Leu 

Leu 

Val 

75 

Ala 

Met 

Tyr 

85 

Val 

Gly 

Asp 

Leu 

Cys 

90 

Gin 

Leu 

Phe 

100 

Thr 

Phe 

Ser 

Pro 

Arg 

105 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Pro 

Gly 

120 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 

Val 

Ser 

Gin 

145 

Leu 

Leu 

Arg 

lie 

Pro 

150 

Ala 

Gly 

Ala 

160 

His 

Trp 

Gly 

Val 

Leu 

165 

Met 

Ala 

Val 

Gly 

Gly 

175 

Val 

190 

Asn 

Asp 

Trp 

Gly 

Ala 

Lys 

Val 

180 


(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T10 
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(xi) 

SEQUENCE 

DESCRIPTION: 

: SEQ ID 

NO: 75 : 



Tyr 

Glu 

Val 

Arg 

Asn 

Val 

Ser 

Gly 

Met 

Tyr 

His 

Val 

Thr 

Asn 

Asp 





5 





10 





15 

Cys 

Ser 

Asn 

Ser 

Ser 

He 

Val 

Phe 

Glu 

Ala 

Ala 

Asp 

Leu 

lie 

Met 




20 





25 





30 

His 

Thr 

Pro 

Gly 

Cys 

Val 

Pro 

Cys 

Val 

Arg 

Glu 

Gly 

Asn 

Ser 

Ser 





35 





40 





45 

Arg 

Cys 

Trp 

Val 

Ala 

Leu 

Thr 

Pro 

Thr 

Leu 

Ala 

Ala 

Arg 

Asn 

Thr 





50 





55 





60 

Ser 

Val 

Pro 

Thr 

Thr 

Thr 

lie 

Arg 

Arg 

His 

Val 

Asp 

Leu 

Leu 

Val 





65 





70 





75 

Gly 

Ala 

Ala 

Ala 

Phe 

Cys 

Ser 

Ala 

Met 

Tyr 

Val 

Gly Asp 

Leu 

Cys 





80 





85 





90 

Gly 

Ser 

Val 

Phe 

Leu 

Val 

Ser 

Gin 

Leu 

Phe 

Thr 

Phe 

Ser 

Pro 

Arg 




95 





100 





105 

Arg 

His 

Glu 

Thr 

Leu 

Gin 

Asp 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Pro 

Gly 





110 





115 





120 

His 

Leu 

Ser 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 





125 





130 





135 

Ser 

Pro 

Thr 

Thr 

Ala 

Leu 

Val 

Val 

Ser 

Gin 

Leu 

Leu 

Arg 

lie 

Pro 





140 





145 





150 

Gin 

Ala 

Val 

Met 

Asp 

Met 

Val 

Thr 

Gly 

Ala 

His 

Trp 

Gly Val 

Leu 





155 





160 





165 

Ala 

Gly 

Leu 

Ala 

Tyr 

Tyr 

Ser 

Met 

Ala 

Gly 

Asn 

Trp 

Ala 

Lys 

Val 





170 





175 





180 

Leu 

lie 

Val 

Met 

Leu 

Leu 

Phe 

Ala 

Gly Val 

Asp 

Gly 








185 





190 







(2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: US6 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 


Tyr Glu Val Arg Asn Val Ser Gly Met Tyr His Val Thr Asn Asp 

5 10 15 

Cys Ser Asn Ser Ser lie Val Tyr Glu Ala Ala Asp Met lie Met 

20 25 30 

His Thr Pro Gly Cys Val Pro Cys Val Arg Glu Asn Asn Ser Ser 

35 40 45 
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131 


Arg 

Cys 

Trp 

Val 

Ala 

Leu 

Thr 

Pro 

Thr 

Leu 

Ala 

Ala 

Arg 

Asn 

Ala 





50 





55 





60 

Ser 

Val 

Pro 

Thr 

Thr 

Thr 

lie 

Arg 

Arg 

His 

Val 

Asp 

Leu 

Leu 

Val 





65 





70 





75 

Gly Ala 

Ala 

Thr 

Phe 

Cys 

Ser 

Ala 

Met 

Tyr 

Val 

Gly 

Asp 

Leu 

Cys 





80 





85 





90 

Gly 

Ser 

Val 

Phe 

Leu 

lie 

Ser 

Gin 

Leu 

Phe 

Thr 

Phe 

Ser 

Pro 

Arg 





95 





100 





105 

Gin 

His 

Glu 

Thr 

Val 

Gin 

Asp 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Pro 

Gly 





110 





115 





120 

His 

Val 

Ser 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 





125 





130 





135 

Ser 

Pro 

Thr 

Ala 

Ala 

Leu 

Val 

Val 

Ser 

Gin 

Leu 

Leu 

Arg 

lie 

Pro 





140 





145 





150 

Gin 

Ala 

Val 

Met 

Asp 

Met 

Val 

Ala 

Gly 

Ala 

His 

Trp 

Gly 

Val 

Leu 





155 





160 





165 

Ala 

Gly 

Leu 

Ala 

Tyr 

Tyr 

Ser 

Met 

Val 

Gly 

Asn 

Trp 

Ala 

Lys 

Val 





170 





175 





180 

Leu 

He 

Val 

Leu 

Leu 

Leu 

Phe 

Ala 

Gly 

Val 

Asp 

Gly 








185 





190 







(2) INFORMATION FOR SEQ ID NO: 77: 

15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

20 (A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T2 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 



Ala 

Gin 

Val 

Arg 

Asn 

5 

Thr 

Ser 

Arg 

Gly 

Tyr 

10 

Met 

Val 

Thr 

Asn 

Asp 

15 

25 

Cys 

Ser 

Asn 

Glu 

Ser 

20 

lie 

Thr 

Trp 

Gin 

Leu 

25 

Gin 

Ala 

Ala 

Val 

Leu 

30 


His 

Val 

Pro 

Gly 

Cys 

35 

lie 

Pro 

Cys 

Glu 

Arg 

40 

Leu 

Gly 

Asn 

Thr 

Ser 

45 


Arg 

Cys 

Trp 

lie 

Pro 

50 

Val 

Thr 

Pro 

Asn 

Val 

55 

Ala 

Val 

Arg 

Gin 

Pro 

60 


Gly 

Ala 

Leu 

Thr 

Gin 

65 

Gly 

Leu 

Arg 

Thr 

His 

70 

lie 

Asp 

Met 

Val 

Val 

75 

30 

Met 

Ser 

Ala 

Thr 

Leu 

80 

Cys 

Ser 

Ala 

Leu 

Tyr 

85 

Val 

Gly 

Asp 

Leu 

Cys 

90 


Gly Gly 

Val 

Met 

Leu 

95 

Ala 

Ala 

Gin 

Met 

Phe 

100 

lie 

Val 

Ser 

Pro 

Arg 

105 


Arg 

His 

Trp 

Phe 

Val 

110 

Gin 

Glu 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Pro 

Gly 

120 

35 

Thr 

lie 

Thr 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 
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Ser 

Pro 

Thr 

Ala 

Thr 

Met 

lie 

Leu 

Ala 

Tyr 

Ala 

Met 

Arg 

Val 

Pro 





140 





145 





150 

Glu 

Val 

lie 

lie 

Asp 

lie 

lie 

Gly 

Gly 

Ala 

His 

Trp 

Gly 

Val 

Met 





155 





160 





165 

Phe 

Gly 

Leu 

Ala 

Tyr 

Phe 

Ser 

Met 

Gin 

Gly 

Ala 

Trp 

Ala 

Lys 

Val 





170 





175 





180 

He 

Val 

lie 

Leu 

Leu 

Leu 

Ala 

Ala 

Gly 

Val 

Asp 

Ala 








185 





190 
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15 


20 


25 
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(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T4 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO:78: 


Ala 

Gin 

Val 

Lys 

Asn 

Thr 

Thr 

Asn 

Ser 

Tyr 

Met 

Val 

Thr 

Asn 

Asp 





5 





10 





15 

Cys 

Ser 

Asn 

Asp 

Ser 

lie 

Thr 

Trp 

Gin 

Leu 

Gin 

Ala 

Ala 

Val 

Leu 





20 





25 





30 

His 

Val 

Pro 

Gly 

Cys 

Val 

Pro 

Cys 

Glu 

Lys 

Thr 

Gly 

Asn 

Thr 

Ser 





35 





40 





45 

Arg 

Cys 

Trp 

lie 

Pro 

Val 

Ser 

Pro 

Asn 

Val 

Ala 

Val 

Arg 

Gin 

Pro 





50 





55 





60 

Gly Ala 

Leu 

Thr 

Gin 

Gly 

Leu 

Arg 

Thr 

His 

lie 

Asp 

Met 

Val 

Val 





65 





70 





75 

Met 

Ser 

Ala 

Thr 

Leu 

Cys 

Ser 

Ala 

Leu 

Tyr 

Val 

Gly 

Asp 

Leu 

Cys 





80 





85 





90 

Gly Gly 

Val 

Met 

Leu 

Ala 

Ala 

Gin 

Met 

Phe 

lie 

Val 

Ser 

Pro 

Gin 





95 





100 





105 

His 

His 

Trp 

Phe 

Val 

Gin 

Asp 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Pro 

Gly 





110 





115 





120 

Thr 

lie 

Thr 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 





125 





130 





135 

Ser 

Pro 

Thr 

Ala 

Thr 

Met 

lie 

Leu 

Ala 

Tyr 

Ala 

Met 

Arg 

Val 

Pro 





140 





145 





150 

Glu 

Val 

lie 

Leu 

Asp 

lie 

Val 

Ser 

Gly Ala 

His 

Trp 

Gly 

Val 

Met 





155 





160 





165 

Phe 

Gly 

Leu 

Ala 

Tyr 

Phe 

Ser 

Met 

Gin 

Gly 

Ala 

Trp 

Ala 

Lys 

Val 





170 





175 





180 

Val 

Val 

lie 

Leu 

Leu 

Leu 

Ala 

Ala 

Gly 

Val 

Asp 

Ala 
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(2) INFORMATION FOR SEQ ID NO: 79: 


<i) SEQUENCE CHARACTERISTICS: 


Ala 

(vi) 

<xi) 

Glu Val 

Lys 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T9 

SEQUENCE DESCRIPTION: SEQ ID NO: 79: 

Asn Thr Ser Thr Ser Tyr Met Val Thr 

Asn 

Asp 

Cys 

Ser 

Asn 

Asp 

5 

Ser 

lie 

Thr 

Trp 

Gin 

10 

Leu 

Gin 

Ala 

Ala 

Val 

15 

Leu 

His 

Val 

Pro 

Gly 

20 

Cys 

Val 

Pro 

Cys 

Glu 

25 

Arg 

Val 

Gly 

Asn 

Ala 

30 

Ser 

Arg 

Cys 

Trp 

lie 

35 

Pro 

Val 

Ser 

Pro 

Asn 

40 

Val 

Ala 

Val 

Gin 

Arg 

45 

Pro 

Gly 

Ala 

Leu 

Thr 

50 

Gin 

Gly 

Leu 

Arg 

Thr 

55 

His 

lie 

Asp 

Met 

Val 

60 

Val 

Met 

Ser 

Ala 

Thr 

65 

Leu 

Cys 

Ser 

Ala 

Leu 

70 

Tyr 

Val 

Gly 

Asp 

Leu 

75 

Cys 

Gly 

Gly 

Val 

Met 

80 

Leu 

Ala 

Ala 

Gin 

Met 

85 

Phe 

lie 

lie 

Ser 

Pro 

90 

Gin 

His 

His 

Trp 

Phe 

95 

Val 

Gin 

Glu 

Cys 

Asn 

100 

Cys 

Ser 

lie 

Tyr 

Pro 

105 

Gly 

Thr 

He 

Thr 

Gly 

110 

His 

Arg 

Met 

Ala 

Trp 

115 

Asp 

Met 

Met 

Met 

Asn 

120 

Trp 

Ser 

Pro 

Thr 

Thr 

125 

Thr 

Met 

lie 

Leu 

Ala 

130 

Tyr 

Ala 

Met 

Arg 

Val 

135 

Pro 

Glu 

Val 

lie 

lie 

140 

Asp 

lie 

lie 

Ser 

Gly 

145 

Ala 

His 

Trp 

Gly 

Val 

150 

Met 

Phe 

Gly 

Leu 

Ala 

155 

Tyr 

Phe 

Ser 

Met 

Gin 

160 

Gly 

Ala 

Trp 

Ala 

Lys 

165 

Val 

Val 

Val 

lie 

Leu 

170 

Leu 

185 

Leu 

Thr 

Ala 

Gly 

175 

Val 

190 

Asp 

Ala 



180 


(2) 

INFORMATION FOR SEQ ID NO: 80: 


(i) 

SEQUENCE CHARACTERISTICS: 



(A) 

LENGTH: 192 amino acids 



(B) 

TYPE : amino acid 



(C) 

STRANDEDNESS : unknown 



(D) 

TOPOLOGY : unknown 


(vi) 

ORIGINAL SOURCE: 



(A) 

ORGANISM: homosapiens 
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(C) INDIVIDUAL ISOLATE: US10 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 80: 


Val 

Gin 

Val 

Lys 

Asn 

Thr 

Ser 

Thr 

Ser 

Tyr 

Met 

Val 

Thr 

Asn 

Asp 





5 





10 





15 

Cys 

Ser 

Asn 

Asp 

Ser 

lie 

Thr 

Trp 

Gin 

Leu 

Glu 

Ala 

Ala 

Val 

Leu 





20 





25 





30 

His 

Val 

Pro 

Gly 

Cys 

Val 

Pro 

Cys 

Glu 

Lys 

Val 

Gly 

Asn 

Thr 

Ser 





35 





40 





45 

Arg 

Cys 

Trp 

lie 

Pro 

Val 

Ser 

Pro 

Asn 

Val 

Ala 

Val 

Gin 

Arg 

Pro 





50 





55 





60 

Gly Ala 

Leu 

Thr 

Gin 

Gly 

Leu 

Arg 

Thr 

His 

lie 

Asp 

Met 

Val 

Val 





65 





70 





75 

Met 

Ser 

Ala 

Thr 

Leu 

Cys 

Ser 

Ala 

Leu 

Tyr 

Val 

Gly 

Asp 

Phe 

Cys 





80 





85 





90 

Gly 

Gly 

Met 

Met 

Leu 

Ala 

Ala 

Gin 

Met 

Phe 

lie 

Val 

Ser 

Pro 

Arg 





95 





100 





105 

His 

His 

Ser 

Phe 

Val 

Gin 

Glu 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Pro 

Gly 





110 





115 





120 

Thr 

He 

Thr 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 





125 





130 





135 

Ser 

Pro 

Thr 

Ala 

Thr 

Leu 

lie 

Leu 

Ala 

Tyr 

Val 

Met 

Arg 

Val 

Pro 





140 





145 





150 

Glu 

Val 

lie 

lie 

Asp 

lie 

lie 

Ser 

Gly 

Ala 

His 

Trp 

Gly 

Val 

Leu 





155 





160 





165 

Phe 

Gly 

Leu 

Ala 

Tyr 

Phe 

Ser 

Met 

Gin 

Gly 

Ala 

Trp 

Ala 

Lys 

Val 





170 





175 





180 

Val 

Val 

lie 

Leu 

Leu 

Leu 

Ala 

Ala 

Gly 

Val 

Asp 

Ala 





185 190 


(2) INFORMATION FOR SEQ ID NO: 81: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

( D ) TOPOLOGY : unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK8 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: 


Val 

Glu 

Val 

Arg 

Asn 

5 

lie 

Ser 

Ser 

Ser 

Tyr 

10 

Tyr 

Ala 

Thr 

Asn 

Asp 

15 

Cys 

Ser 

Asn 

Asn 

Ser 

20 

lie 

Thr 

Trp 

Gin 

Leu 

25 

Thr 

Asp 

Ala 

Val 

Leu 

30 

His 

Leu 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Glu 

Asn 

40 

Asp 

Asn 

Gly 

Thr 

Leu 

45 
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135 


Arg 

Cys 

Trp 

lie 

Gin 

Val 

Thr 

Pro 

Asn 

Val 

Ala 

Val 

Lys 

His 

Arg 





50 





55 





60 

Gly 

Ala 

Leu 

Thr 

His 

Asn 

Leu 

Arg 

Thr 

His 

Val 

Asp 

Val 

lie 

Val 





65 





70 





75 

Met 

Ala 

Ala 

Thr 

Val 

Cys 

Ser 

Ala 

Leu 

Tyr 

Val 

Gly Asp 

Val 

Cys 





80 





85 





90 

Gly 

Ala 

Val 

Met 

lie 

Val 

Ser 

Gin 

Ala 

Leu 

lie 

lie 

Ser 

Pro 

Glu 





95 





100 





105 

Arg 

His 

Asn 

Phe 

Thr 

Gin 

Glu 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Gin 

Gly 





110 





115 





120 

His 

lie 

Thr 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Leu 

Asn 

Trp 





125 





130 





135 

Ser 

Pro 

Thr 

Leu 

Thr 

Met 

lie 

Leu 

Ala 

Tyr 

Ala 

Ala 

Arg 

Val 

Pro 





140 





145 





150 

Glu 

Leu 

Ala 

Leu 

Gin 

Val 

Val 

Phe 

Gly 

Gly 

His 

Trp 

Gly 

Val 

Val 





155 





160 





165 

Phe 

Gly 

Leu 

Ala 

Tyr 

Phe 

Ser 

Met 

Gin 

Gly 

Ala 

Trp 

Ala 

Lys 

Val 





170 





175 





180 

He 

Ala 

lie 

Leu 

Leu 

Leu 

Val 

Ala 

Gly 

Val 

Asp 

Ala 








185 





190 







15 (2) INFORMATION FOR SEQ ID NO: 82: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

20 , . x 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK11 



<xi> 

SEQUENCE 

DESCRIPTION: 

: SEQ ID 

NO: ! 

32 : 



Val 

Glu 

Val 

Arg 

Asn 

5 

Thr 

Ser 

Ser 

Ser 

Tyr 

10 

Tyr 

Ala 

Thr 

Asn 

Asp 

15 

Cys 

Ser 

Asn 

Asn 

Ser 

20 

lie 

Thr 

Trp 

Gin 

Leu 

25 

Thr 

Asn 

Ala 

Val 

Leu 

30 

His 

Leu 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Glu 

Asn 

40 

Asp 

Asn 

Gly 

Thr 

Leu 

45 

His 

Cys 

Trp 

lie 

Gin 

50 

Val 

Thr 

Pro 

Asn 

Val 

55 

Ala 

Val 

Lys 

His 

Arg 

60 

Gly 

Ala 

Leu 

Thr 

His 

65 

Asn 

Leu 

Arg 

Ala 

His 

70 

lie 

Asp 

Met 

lie 

Val 

75 

Met 

Ala 

Ala 

Thr 

Val 

80 

Cys 

Ser 

Ala 

Leu 

Tyr 

85 

Val 

Gly 

Asp 

Val 

Cys 

90 

Gly 

Ala 

Val 

Met 

lie 

95 

Val 

Ser 

Gin 

Ala 

Phe 

100 

lie 

Val 

Ser 

Pro 

Glu 

105 

His 

His 

His 

Phe 

Thr 

110 

Gin 

Glu 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Gin 

Gly 

120 


35 
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136 


His lie Thr Gly His Arg Met Ala Trp Asp Met Met Leu Asn Trp 

125 130 135 

Ser Pro Thr Leu Thr Met lie Leu Ala Tyr Ala Ala Arg Val Pro 

140 145 150 

Glu Leu Val Leu Glu Val Val Phe Gly Gly His Trp Gly Val Val 

155 160 165 

Phe Gly Leu Ala Tyr Phe Ser Met Gin Gly Ala Trp Ala Lys Val 

170 175 180 

lie Ala lie Leu Leu Leu Val Ala Gly Val Asp Ala 

185 190 
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(2) INFORMATION FOR SEQ ID NO: 83: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SW3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:83: 

Val Glu Val Arg Asn lie Ser Ser Ser Tyr Tyr Ala Thr Asn Asp 

5 10 15 

Cys Ser Asn Ser Ser lie Thr Trp Gin Leu Thr Asn Ala Val Leu 

20 25 30 

His Leu Pro Gly Cys Val Pro Cys Glu Asn Asp Asn Gly Thr Leu 

35 40 45 

His Cys Trp lie Gin Val Thr Pro Asn Val Ala Val Lys His Arg 

50 55 60 

Gly Ala Leu Thr His Asn Leu Arg Ala His Val Asp Met lie Val 

65 70 75 

Met Ala Ala Thr Val Cys Ser Ala Leu Tyr Val Gly Asp Met Cys 

80 85 90 

Gly Ala Val Met lie Val Ser Gin Ala Phe lie lie Ser Pro Glu 

95 100 105 

Arg His Asn Phe Thr Gin Glu Cys Asn Cys Ser lie Tyr Gin Gly 

110 115 120 

Arg lie Thr Gly His Arg Met Ala Trp Asp Met Met Leu Asn Trp 

125 130 135 

Ser Pro Thr Leu Thr Met lie Leu Ala Tyr Ala Ala Arg Val Pro 

140 145 150 

Glu Leu Val Leu Glu Val Val Phe Gly Gly His Trp Gly Val Val 

155 160 165 

Phe Gly Leu Ala Tyr Phe Ser Met Gin Gly Ala Trp Ala Lys Val 

170 175 180 

lie Ala lie Leu Leu Leu Val Ala Gly Val Asp Ala 

185 190 
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(2) INFORMATION FOR SEQ ID NO : 84 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T8 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84: 


Val 

Glu 

Val 

Arg 

Asn 

5 

Thr 

Ser 

Phe 

Ser 

Tyr 

10 

Tyr 

Ala 

Thr 

Asn 

Asp 

15 

Cys 

Ser 

Asn 

Asn 

Ser 

20 

lie 

Thr 

Trp 

Gin 

Leu 

25 

Thr 

Asn 

Ala 

Val 

Leu 

30 

His 

Leu 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Glu 

Asn 

40 

Asp 

Asn 

Gly 

Thr 

Leu 

45 

Arg 

Cys 

Trp 

lie 

Gin 

50 

Val 

Thr 

Pro 

Asn 

Val 

55 

Ala 

Val 

Lys 

His 

Arg 

60 

Gly 

Ala 

Leu 

Thr 

His 

65 

Asn 

Leu 

Arg 

Thr 

His 

70 

Val 

Asp 

Val 

lie 

Val 

75 

Met 

Ala 

Ala 

Thr 

Val 

80 

Cys 

Ser 

Ala 

Leu 

Tyr 

85 

Val 

Gly 

Asp 

Val 

Cys 

90 

Gly Ala 

Val 

Met 

lie 

95 

Ala 

Ser 

Gin 

Ala 

Phe 

100 

lie 

lie 

Ser 

Pro 

Glu 

105 

Arg 

His 

Asn 

Phe 

Thr 

110 

Gin 

Glu 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Gin 

Gly 

120 

His 

He 

Thr 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Leu 

Asn 

Trp 

135 

Ser 

Pro 

Thr 

Leu 

Thr 

140 

Met 

lie 

Leu 

Ala 

Tyr 

145 

Ala 

Ala 

Arg 

Val 

Pro 

150 

Glu 

Leu 

Val 

Leu 

Glu 

155 

Val 

Val 

Phe 

Gly 

Gly 

160 

His 

Trp 

Gly 

Val 

Val 

165 

Phe 

lie 

Gly 

Ala 

Leu 

lie 

Ala 

Leu 

Tyr 

170 

Leu 

185 

Phe 

Leu 

Ser 

Val 

Met 

Ala 

Gin Gly 
175 
Gly Val 
190 

Ala 

Asp 

Trp 

Ala 

Ala 

Lys 

Val 

180 


(2) INFORMATION FOR SEQ ID NO: 85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 
(C) INDIVIDUAL ISOLATE: S83 



(xi) 

SEQUENCE 

DESCRIPTION 

: SEQ ID 

NO:) 

35 : 


Val 

Glu 

Val 

Lys 

Asp 

Thr 

Gly 

Asp 

Ser 

Tyr 

Met 

Pro 

Thr 

Asn 





5 





10 





Cys 

Ser 

Asn 

Ser 

Ser 

lie 

Val 

Trp 

Gin 

Leu 

Glu 

Gly 

Ala 

Val 





20 





25 





His 

Thr 

Pro 

Gly 

Cys 

Val 

Pro 

Cys 

Glu 

Arg 

Thr 

Ala 

Asn 

Val 





35 





40 





Arg 

Cys 

Trp 

Val 

Pro 

Val 

Ala 

Pro 

Asn 

Leu 

Ala 

lie 

Ser 

Gin 





50 





55 





Gly Ala 

Leu 

Thr 

Lys 

Gly 

Leu 

Arg 

Ala 

His 

lie 

Asp 

lie 

lie 





65 





70 





Met 

Ser 

Ala 

Thr 

Val 

Cys 

Ser 

Ala 

Leu 

Tyr 

Val 

Gly 

Asp 

Val 





80 





85 





Gly 

Ala 

Leu 

Met 

Leu 

Ala 

Ala 

Gin 

Val 

Val 

Val 

Val 

Ser 

Pro 





95 





100 





His 

His 

Thr 

Phe 

Val 

Gin 

Glu 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Pro 





110 





115 





Arg 

lie 

Thr 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 





125 





130 





Ser 

Pro 

Thr 

Thr 

Thr 

Met 

Leu 

Leu 

Ala 

Tyr 

Leu 

Val 

Arg 

lie 





140 





145 





Glu 

Val 

lie 

Leu 

Asp 

lie 

Val 

Thr 

Gly 

Gly 

His 

Trp 

Gly 

Val 





155 





160 





Phe 

Gly 

Leu 

Ala 

Tyr 

Phe 

Ser 

Met 

Gin 

Gly 

Ser 

Trp 

Ala 

Lys 





170 





175 





He 

Val 

lie 

Leu 

Leu 

Leu 

Thr 

Ala 

Gly 

Val 

Glu 

Ala 







185 





190 






(2) INFORMATION FOR SEQ ID NO: 86: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

( C ) S TRANDEDNE S S : unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK12 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:86: 

Leu Glu Trp Arg Asn Val Ser Gly Leu Tyr Val Leu Thr Asn 

5 10 

Cys Ser Asn Ser Ser lie Val Tyr Glu Ala Asp Asp Val lie 
20 25 


372577J 


Asp 

15 

Leu 

30 

Ser 

45 

Pro 

60 

Val 

75 

Cys 

90 

Gin 

105 

Gly 

120 

Trp 

135 

Pro 

150 

Met 

165 

Val 

180 


Asp 

15 

Leu 

30 



139 


His 

Thr 

Pro 

Gly 

Cys 

Val 

Pro 

Cys 

Val 

Gin 

Asp 

Gly 

Asn 

Thr 

Ser 





35 





40 





45 

Thr 

Cys 

Trp 

Thr 

Ser 

Val 

Thr 

Pro 

Thr 

Val 

Ala 

Val 

Arg 

Tyr 

Val 





50 





55 





60 

Gly Ala 

Thr 

Thr 

Ala 

Ser 

lie 

Arg 

Ser 

His 

Val 

Asp 

Leu 

Leu 

Val 





65 





70 





75 

Gly Ala 

Ala 

Thr 

Met 

Cys 

Ser 

Ala 

Leu 

Tyr 

Val 

Gly 

Asp 

Val 

Cys 





80 





85 





90 

Gly Ala 

Val 

Phe 

Leu 

Val 

Gly 

Gin 

Ala 

Phe 

Thr 

Phe 

Arg 

Pro 

Arg 





95 





100 





105 

Arg 

His 

Gin 

Thr 

Val 

Gin 

Thr 

Cys 

Asn 

Cys 

Ser 

Leu 

Tyr 

Pro 

Gly 





110 





115 





120 

His 

Leu 

Ser 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 





125 





130 





135 

Ser 

Pro 

Ala 

Val 

Gly 

Met 

Val 

Val 

Ala 

His 

Val 

Leu 

Arg 

Leu 

Pro 





140 





145 




150 

Gin 

Thr 

Leu 

Phe 

Asp 

lie 

lie 

Ala 

Gly 

Ala 

His 

Trp 

Gly 

lie 

Met 





155 





160 





165 

Ala 

Gly 

Leu 

Ala 

Tyr 

Tyr 

Ser 

Met 

Gin 

Gly 

Asn 

Trp 

Ala 

Lys 

Val 





170 





175 





180 

Ala 

He 

lie 

Met 

Val 

Met 

Phe 

Ser 

Gly 

Val 

Asp 

Ala 








185 




190 






20 


25 


30 


35 


(2) INFORMATION FOR SEQ ID NO: 87: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: HK10 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 


Leu 

Glu 

Trp 

Arg 

Asn 

5 

Val 

Ser 

Gly 

Leu 

Tyr 

10 

Val 

Leu 

Thr 

Asn 

Asp 

15 

Cys 

Pro 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Ala 

25 

Asp 

Asp 

Val 

lie 

Leu 

30 

His 

Thr 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Gin 

40 

Asp 

Gly 

Asn 

Thr 

Ser 

45 

Thr 

Cys 

Trp 

Thr 

Ser 

50 

Val 

Thr 

Pro 

Thr 

Val 

55 

Ala 

Val 

Arg 

Tyr 

Val 

60 

Gly 

Ala 

Thr 

Thr 

Ala 

65 

Ser 

lie 

Arg 

Ser 

His 

70 

Val 

Asp 

Leu 

Leu 

Val 

75 

Gly 

Ala 

Ala 

Thr 

Met 

80 

Cys 

Ser 

Ala 

Leu 

Tyr 

85 

Val 

Gly 

Asp 

Met 

Cys 

90 

Gly Ala 

Val 

Phe 

Leu 

95 

Val 

Gly 

Gin 

Ala 

Phe 

100 

Thr 

Phe 

Arg 

Pro 

Arg 

105 


372577_1 
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Arg 

His 

Gin 

Thr 

Val 

Gin 

Thr 

Cys 

Asn 

Cys 

Ser 

Leu 

Tyr 

Pro 

Gly 





110 





115 





120 

His 

Leu 

Ser 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 





125 





130 





135 

Ser 

Pro 

Ala 

Val 

Gly 

Met 

Val 

Val 

Ala 

His 

Val 

Leu 

Arg 

Leu 

Pro 





140 





145 




150 

Gin 

Thr 

Leu 

Phe 

Asp 

lie 

lie 

Ala 

Gly 

Ala 

His 

Trp 

Gly 

lie 

Leu 





155 





160 




165 

Ala 

Gly 

Leu 

Ala 

Tyr 

Tyr 

Ser 

Met 

Gin 

Gly 

Asn 

Trp 

Ala 

Lys 

Val 





170 





175 




180 

Ala 

He 

lie 

Met 

Val 

Met 

Phe 

Ser 

Gly 

Val 

Asp 

Ala 








185 





190 






(2) INFORMATION FOR SEQ ID NO: 88: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S2 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88: 


Leu 

Glu 

Trp 

Arg 

Asn 

5 

Thr 

Ser 

Gly 

Leu 

Tyr 

10 

Val 

Leu 

Thr 

Asn 

Asp 

15 

Cys 

Ser 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Ala 

25 

Asp 

Asp 

Val 

lie 

Leu 

30 

His 

Thr 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Gin 

40 

Asp 

Gly 

Asn 

Thr 

Ser 

45 

Thr 

Cys 

Trp 

Thr 

Pro 

50 

Val 

Thr 

Pro 

Thr 

Val 

55 

Ala 

Val 

Arg 

Tyr 

Val 

60 

Gly Ala 

Thr 

Thr 

Ala 

65 

Ser 

lie 

Arg 

Ser 

His 

70 

Val 

Asp 

Leu 

Leu 

Val 

75 

Gly 

Ala 

Ala 

Thr 

Met 

80 

Cys 

Ser 

Ala 

Leu 

Tyr 

85 

Val 

Gly 

Asp 

Met 

Cys 

90 

Gly Ala 

Val 

Phe 

Leu 

95 

Val 

Gly 

Gin 

Ala 

Phe 

100 

Thr 

Phe 

Arg 

Pro 

Arg 

105 

Arg 

His 

Gin 

Thr 

Val 

110 

Gin 

Thr 

Cys 

Asn 

Cys 

115 

Ser 

Leu 

Tyr 

Pro 

Gly 

120 

His 

Leu 

Ser 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 

Ser 

Pro 

Ala 

Val 

Gly 

140 

Met 

Val 

Val 

Ala 

His 

145 

Val 

Leu 

Arg 

Leu 

Pro 

150 

Gin 

Thr 

Val 

Phe 

Asp 

155 

lie 

lie 

Ala 

Gly Ala 
160 

His 

Trp 

Gly 

lie 

Leu 

165 

Ala 

Gly 

Leu 

Ala 

Tyr 

170 

Tyr 

Ser 

Met 

Gin 

Gly 

175 

Asn 

Trp 

Ala 

Lys 

Val 

180 


372577 J 
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Ala lie lie Met Val Met Phe Ser Gly Val Asp Ala 
185 190 


5 


10 


15 


20 


25 


(2) INFORMATION FOR SEQ ID NO: 89: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S52 



(xi) 

SEQUENCE 

DESCRIPTION 

: SEQ ID 

NO : 8 9 : 



Leu 

Glu 

Trp 

Arg 

Asn 

5 

Thr 

Ser 

Gly 

Leu 

Tyr 

10 

Val 

Leu Thr 

Asn 

Asp 

15 

Cys 

Ser 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Ala 

25 

Asp 

Asp Val 

lie 

Leu 

30 

His 

Thr 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Gin 

40 

Asp 

Gly Asn 

Thr 

Ser 

45 

Met 

Cys 

Trp 

Thr 

Pro 

50 

Val 

Thr 

Pro 

Thr 

Val 

55 

Ala 

Val Arg 

Tyr 

Val 

60 

Gly Ala 

Thr 

Thr 

Ala 

65 

Ser 

lie 

Arg 

Ser 

His 

70 

Val 

Asp Leu 

Leu 

Val 

75 

Gly Ala 

Ala 

Thr 

Leu 

80 

Cys 

Ser 

Ala 

Leu 

Tyr 

85 

Val 

Gly Asp 

Met 

Cys 

90 

Gly 

Ala 

Val 

Phe 

Leu 

95 

Val 

Gly 

Gin 

Ala 

Phe 

100 

Thr 

Phe Arg 

Pro 

Arg 

105 

Arg 

His 

Gin 

Thr 

Val 

110 

Gin 

Thr 

Cys 

Asn 

Cys 

115 

Ser 

Leu Tyr 

Pro 

Gly 

120 

His 

Val 

Ser 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met Met 

Asn 

Trp 

135 

Ser 

Pro 

Ala 

Val 

Gly 

140 

Met 

Val 

Val 

Ala 

His 

145 

lie 

Leu Arg 

Leu 

Pro 

150 

Gin 

Thr 

Leu 

Phe 

Asp 

155 

lie 

Leu 

Ala 

Gly 

Ala 

160 

His 

Trp Gly 

lie 

Leu 

165 

Ala 

Ala 

Gly 

He 

Leu 

Val 

Ala 

Met 

Tyr 

170 

lie 

185 

Tyr 

Met 

Ser 

Phe 

Met 

Ser 

Gin 

Gly 

Gly Asn 
175 

Val Asp 
190 

Trp Ala 
Ala 

Lys 

Val 

180 


(2) INFORMATION FOR SEQ ID NO: 90: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 


372577J 
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5 


10 
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20 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S54 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 90: 


Leu Glu 

Trp 

Arg 

Asn 

5 

Thr 

Ser 

Gly 

Leu 

Tyr 

10 

lie 

Leu 

Thr 

Asn 

Asp 

15 

Cys Ser 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Ala 

25 

Asp 

Asp 

Val 

lie 

Leu 

30 

His Thr 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Gin 

40 

Asp 

Gly 

Asn 

Thr 

Ser 

45 

Thr Cys 

Trp 

Thr 

Pro 

50 

Val 

Thr 

Pro 

Thr 

Val 

55 

Ala 

Val 

Arg 

Tyr 

Val 

60 

Gly Ala 

Thr 

Thr 

Ala 

65 

Ser 

lie 

Arg 

Ser 

His 

70 

Val 

Asp 

Leu 

Leu 

Val 

75 

Gly Ala 

Ala 

Thr 

Leu 

80 

Cys 

Ser 

Ala 

Leu 

Tyr 

85 

Val 

Gly 

Asp 

Met 

Cys 

90 

Gly Ala 

Val 

Phe 

Leu 

95 

Val 

Gly 

Gin 

Ala 

Phe 

100 

Thr 

Phe 

Arg 

Pro 

Arg 

105 

Arg His 

Gin 

Thr 

Val 

110 

Gin 

Thr 

Cys 

Asn 

Cys 

115 

Ser 

Leu 

Tyr 

Pro 

Gly 

120 

His Leu 

Ser 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 

Ser Pro 

Ala 

Val 

Gly 

140 

Met 

Val 

Val 

Ala 

His 

145 

lie 

Leu 

Arg 

Leu 

Pro 

150 

Gin Thr 

Leu 

Phe 

Asp 

155 

lie 

Leu 

Ala 

Gly 

Ala 

160 

His 

Trp 

Gly 

lie 

Leu 

165 

Ala Gly Leu Ala Tyr Tyr Ser 
170 

Ala lie lie Met lie Met Phe 
185 

(2) INFORMATION FOR SEQ ID 

Met Gin 
Ser Gly 

NO: 91 : 

Gly Asn 
175 

Val Asp 
190 

Trp 

Ala 

Ala 

Lys 

Val 

180 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: Z4 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 91: 

Glu His Tyr Arg Asn Ala Ser Gly He Tyr His lie Thr Asn Asp 

5 10 15 


372577_ 
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Cys 

Pro 

Asn 

Ser 

Ser 

lie 

Val 

Tyr 

Glu 

Ala 

Asp 

His 

His 

lie 

Leu 





20 





25 




30 

His 

Leu 

Pro 

Gly 

Cys 

Val 

Pro 

Cys 

Val 

Met 

Thr 

Gly Asn 

Thr 

Ser 





35 





40 





45 

Arg 

Cys 

Trp 

Thr 

Pro 

Val 

Thr 

Pro 

Thr 

Val 

Ala 

Val 

Ala 

His 

Pro 





50 





55 





60 

Gly Ala 

Pro 

Leu 

Glu 

Ser 

Phe 

Arg 

Arg 

His 

Val 

Asp 

Leu 

Met 

Val 





65 





70 





75 

Gly Ala 

Ala 

Thr 

Leu 

Cys 

Ser 

Ala 

Leu 

Tyr 

Val 

Gly Asp 

Leu 

Cys 





80 





85 





90 

Gly Gly 

Ala 

Phe 

Leu 

Met 

Gly 

Gin 

Met 

lie 

Thr 

Phe 

Arg 

Pro 

Arg 





95 





100 




105 

Arg 

His 

Trp 

Thr 

Thr 

Gin 

Glu 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Thr 

Gly 





110 





115 





120 

His 

He 

Thr 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 





125 





130 





135 

Ser 

Pro 

Thr 

Thr 

Thr 

Leu 

Leu 

Leu 

Ala 

Gin 

lie 

Met 

Arg 

Val 

Pro 





140 





145 




150 

Thr 

Ala 

Phe 

Leu 

Asp 

Met 

Val 

Ala 

Gly 

Gly 

His 

Trp 

Gly Val 

Leu 





155 





160 





165 

Ala 

Gly 

Leu 

Ala 

Tyr 

Phe 

Ser 

Met 

Gin 

Gly Asn 

Trp 

Ala 

Lys 

Val 





170 





175 





180 

Val 

Leu 

Val 

Leu 

Phe 

Leu 

Phe 

Ala 

Gly 

Val 

Asp 

Ala 








185 





190 






(2) INFORMATION FOR SEQ ID NO: 92: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: Z1 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 92: 


Val 

His 

Tyr 

Arg 

Asn 

5 

Ala 

Ser 

Gly 

Val 

Tyr 

10 

His 

Val 

Thr 

Asn 

Asp 

15 

Cys 

Pro 

Asn 

Thr 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Thr 

25 

Glu 

His 

His 

lie 

Met 

30 

His 

Leu 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Arg 

40 

Thr 

Glu 

Asn 

Thr 

Ser 

45 

Arg 

Cys 

Trp 

Val 

Pro 

50 

Leu 

Thr 

Pro 

Thr 

Val 

55 

Ala 

Ala 

Pro 

Tyr 

Pro 

60 

Asn 

Ala 

Pro 

Leu 

Glu 

65 

Ser 

Met 

Arg 

Arg 

His 

70 

Val 

Asp 

Leu 

Met 

Val 

75 

Gly 

Ala 

Ala 

Thr 

Met 

80 

Cys 

Ser 

Ala 

Phe 

Tyr 

85 

lie 

Gly 

Asp 

Leu 

Cys 

90 
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Glv Gly Val Phe Leu Val Gly Gin Leu Phe Asp Phe Arg Pro Arg 

95 100 105 

Arg His Trp Thr Thr Gin Asp Cys Asn Cys Ser He Tyr Pro Gly 

110 115 120 

His Val Ser Gly His Arg Met Ala Trp Asp Met Met Met Asn Trp 

125 130 135 

Ser Pro Thr Ser Ala Leu lie Met Ala Gin lie Leu Arg lie Pro 

140 145 150 

Ser lie Leu Gly Asp Leu Leu Thr Gly Gly His Trp Gly Val Leu 

155 160 165 

Ala Gly Leu Ala Phe Phe Ser Met Gin Ser Asn Trp Ala Lys Val 

170 175 180 

lie Leu Val Leu Phe Leu Phe Ala Gly Val Glu Gly 

185 190 
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(2) INFORMATION FOR SEQ ID NO:93: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: Z6 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 93: 

Val Asn Tyr Arg Asn Ala Ser Gly Val Tyr His Val Thr Asn Asp 

5 10 15 

Cys Pro Asn Ser Ser lie Val Tyr Glu Ala Glu His Gin lie Leu 

20 25 30 

His Leu Pro Gly Cys Leu Pro Cys Val Arg Val Gly Asn Gin Ser 

35 40 45 

Arq Cys Trp Val Ala Leu Thr Pro Thr Val Ala Val Ser Tyr lie 

50 55 60 

Gly Ala Pro Leu Asp Ser Leu Arg Arg His Val Asp Leu Met Val 

65 70 75 

Gly Ala Ala Thr Val Cys Ser Ala Leu Tyr Val Gly Asp Leu Cys 

80 85 90 

Gly Gly Ala Phe Leu Val Gly Gin Met Phe Ser Phe Gin Pro Arg 

95 100 105 

Arg His Trp Thr Thr Gin Asp Cys Asn Cys Ser lie Tyr Ala Gly 

110 115 120 

His lie Thr Gly His Arg Met Ala Trp Asp Met Met Met Asn Trp 

125 130 135 

Ser Pro Thr Thr Thr Leu Leu Leu Ala Gin Val Met Arg lie Pro 

140 145 150 

Ser Thr Leu Val Asp Leu Leu Ala Gly Gly His Trp Gly Val Leu 

155 160 165 


35 
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Val Gly Leu Ala Tyr Phe Ser Met Gin Ala Asn Trp Ala Lys Val 
170 175 180 

lie Leu Val Leu Phe Leu Phe Ala Gly Val Asp Ala 
185 190 


(2) INFORMATION FOR SEQ ID NO : 94 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: Z7 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 94: 


Val 

Asn 

Tyr 

His 

Asn 

5 

Ala 

Ser 

Gly 

Val 

Tyr 

10 

His 

lie 

Thr 

Asn 

Asp 

15 

Cys 

Pro 

Asn 

Ser 

Ser 

20 

lie 

Met 

Tyr 

Glu 

Ala 

25 

Glu 

His 

His 

lie 

Leu 

30 

His 

Leu 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Arg 

40 

Glu 

Gly 

Asn 

Gin 

Ser 

45 

Arg 

Cys 

Trp 

Val 

Ala 

50 

Leu 

Thr 

Pro 

Thr 

Val 

55 

Ala 

Ala 

Pro 

Tyr 

lie 

60 

Gly 

Ala 

Pro 

Leu 

Glu 

65 

Ser 

lie 

Arg 

Arg 

His 

70 

Val 

Asp 

Leu 

Met 

Val 

75 

Gly 

Ala 

Ala 

Thr 

Val 

80 

Cys 

Ser 

Ala 

Leu 

Tyr 

85 

lie 

Gly 

Asp 

Leu 

Cys 

90 

Gly 

Gly 

Val 

Phe 

Leu 

95 

Val 

Gly 

Gin 

Met 

Phe 

100 

Ser 

Phe 

Gin 

Pro 

Arg 

105 

Arg 

His 

Trp 

Thr 

Thr 

110 

Gin 

Asp 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Ala 

Gly 

120 

His 

Val 

Thr 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 

Ser 

Pro 

Thr 

Thr 

Thr 

140 

Leu 

Val 

Leu 

Ala 

Gin 

145 

Val 

Met 

Arg 

lie 

Pro 

150 

Ser 

Thr 

Leu 

Val 

Asp 

155 

Leu 

Leu 

Thr 

Gly 

Gly 

160 

His 

Trp 

Gly 

lie 

Leu 

165 

He 

lie 

Gly Val 
Leu Val 

Ala 

Leu 

Tyr 

170 

Phe 

185 

Phe 

Leu 

Cys 

Tyr 

Met 

Ala 

Gin 

Gly 

Ala 

175 

Val 

190 

Asn 

Asp 

Trp 

Ala 

Ala 

Lys 

Val 

180 


(2) INFORMATION FOR SEQ ID NO: 95: 

(i) SEQUENCE CHARACTERISTICS: 


(A) 

LENGTH : 

: 192 

amino acids 

(B) 

TYPE: 

amino 

acid 

(C) 

STRANDEDNESS : 

: unknown 
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(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK13 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 95: 


Tyr 

Asn 

Tyr 

Arg 

Asn 

5 

Ser 

Ser 

Gly 

Val 

Tyr 

10 

His 

Val 

Thr 

Asn 

Asp 

15 

Cys 

Pro 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Thr 

25 

Asp 

Tyr 

His 

lie 

Leu 

30 

His 

Leu 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Arg 

40 

Glu 

Gly 

Asn 

Lys 

Ser 

45 

Thr 

Cys 

Trp 

Val 

Ser 

50 

Leu 

Thr 

Pro 

Thr 

Val 

55 

Ala 

Ala 

Gin 

His 

Leu 

60 

Asn 

Ala 

Pro 

Leu 

Glu 

65 

Ser 

Leu 

Arg 

Arg 

His 

70 

Val 

Asp 

Leu 

Met 

Val 

75 

Gly 

Gly 

Ala 

Thr 

Leu 

80 

Cys 

Ser 

Ala 

Leu 

Tyr 

85 

lie 

Gly 

Asp 

Val 

Cys 

90 

Gly 

Gly 

Val 

Phe 

Leu 

95 

Val 

Gly 

Gin 

Leu 

Phe 

100 

Thr 

Phe 

Gin 

Pro 

Arg 

105 

Arg 

His 

Trp 

Thr 

Thr 

110 

Gin 

Asp 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Thr 

Gly 

120 

His 

lie 

Thr 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 

Ser 

Pro 

Thr 

Ala 

Thr 

140 

Leu 

Val 

Leu 

Ala 

Gin 

145 

Leu 

Met 

Arg 

lie 

Pro 

150 

Gly 

Ala 

Met 

Val 

Asp 

155 

Leu 

Leu 

Ala 

Gly 

Gly 

160 

His 

Trp 

Gly 

lie 

Leu 

165 

Val 

He 

Gly 

Leu 

lie 

Val 

Ala 

Leu 

Tyr 

170 

Phe 

185 

Phe 

Leu 

Ser 

Phe 

Met 

Ala 

Gin 

Gly 

Ala 

175 

Val 

190 

Asn 

Asp 

Trp 

Ala 

Ala 

Lys 

Val 

180 


(2) INFORMATION FOR SEQ ID NO: 96: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 96: 

Val Pro Tyr Arg Asn Ala Ser Gly Val Tyr His Val Thr Asn Asp 

5 10 15 
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Cys 

Pro 

Asn 

Ser 

Ser 

lie 

Val 

Tyr 

Glu 

Ala 

Asp 

Ser 

Leu 

lie 

Leu 




20 





25 





30 

His 

Ala 

Pro 

Gly 

Cys 

Val 

Pro 

Cys 

Val 

Arg 

Gin 

Asp 

Asn 

Val 

Ser 




35 





40 





45 

Arg 

Cys 

Trp 

Val 

Gin 

lie 

Thr 

Pro 

Thr 

Leu 

Ser 

Ala 

Pro 

Thr 

Phe 


50 





55 





60 

Gly 

Ala 

Val 

Thr 

Ala 

Pro 

Leu 

Arg 

Arg 

Ala 

Val 

Asp 

Tyr 

Leu 

Ala 




65 





70 





75 

Gly Gly 

Ala 

Ala 

Leu 

Cys 

Ser 

Ala 

Leu 

Tyr 

Val 

Gly 

Asp 

Ala 

Cys 





80 





85 





90 

Gly Ala 

Val 

Phe 

Leu 

Val 

Gly 

Gin 

Met 

Phe 

Thr 

Tyr 

Arg 

Pro 

Arg 





95 





100 





105 

Gin 

His 

Thr 

Thr 

Val 

Gin 

Asp 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Ser 

Gly 





110 





115 





120 

His 

lie 

Thr 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 





125 





130 





135 

Ser 

Pro 

Thr 

Thr 

Ala 

Leu 

Leu 

Met 

Ala 

Gin 

Met 

Leu 

Arg 

lie 

Pro 





140 





145 





150 

Gin 

Val 

Val 

He 

Asp 

lie 

lie 

Ala 

Gly 

Gly 

His 

Trp 

Gly Val 

Leu 





155 





160 





165 

Phe 

Ala 

Ala 

Ala 

Tyr 

Phe 

Ala 

Ser 

Ala 

Ala 

Asn 

Trp 

Ala 

Lys 

Val 





170 





175 





180 

Val 

Leu 

Val 

Leu 

Phe 

Leu 

Phe 

Ala 

Gly 

Val 

Asp 

Gly 





25 


30 


INFORMATION FOR SEQ ID NO: 97 

(i) 


(vi) 


SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA4 



(xi) 

SEQUENCE 

DESCRIPTION: 

: SEQ ID 

NO: 97 : 



Val 

Pro 

Tyr 

Arg 

Asn 

5 

Ala 

Ser 

Gly 

Val 

Tyr 

10 

His 

Val 

Thr 

Asn 

Asp 

15 

Cys 

Pro 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Ala 

25 

Asp 

Asn 

Leu 

lie 

Leu 

30 

His 

Ala 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Arg 

40 

Gin 

Asp 

Asn 

Val 

Ser 

45 

Lys 

Cys 

Trp 

Val 

Gin 

50 

lie 

Thr 

Pro 

Thr 

Leu 

55 

Ser 

Ala 

Pro 

Asn 

Leu 

60 

Gly 

Ala 

Val 

Thr 

Ala 

65 

Pro 

Leu 

Arg 

Arg 

Ala 

70 

Val 

Asp 

Tyr 

Leu 

Ala 

75 

Gly 

Gly 

Ala 

Ala 

Leu 

80 

Cys 

Ser 

Ala 

Leu 

Tyr 

85 

Val 

Gly 

Asp 

Ala 

Cys 

90 


35 
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Gly 

Ala 

Val 

Phe 

Leu 

Val 

Gly 

Gin 

Met 

Phe 

Thr 

Tyr 

Arg 

Pro 

Arg 




95 





100 





105 

Gin 

His 

Thr 

Thr 

Val 

Gin 

Asp 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Ser 

Gly 





110 





115 





120 

His 

lie 

Thr 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 




125 





130 





135 

Ser 

Pro 

Thr 

Thr 

Ala 

Leu 

Leu 

Met 

Ala 

Gin 

Leu 

Leu 

Arg 

lie 

Pro 





140 





145 





150 

Gin 

Val 

Val 

lie 

Asp 

lie 

lie 

Ala 

Gly 

Gly 

His 

Trp 

Gly Val 

Leu 





155 





160 





165 

Phe 

Ala 

Ala 

Ala 

Tyr 

Phe 

Ala 

Ser 

Ala 

Ala 

Asn 

Trp 

Ala 

Lys 

Val 





170 





175 





180 

He 

Leu 

Val 

Leu 

Phe 

Leu 

Phe 

Ala 

Gly 

Val 

Asp 

Ala 








185 





190 







10 

(2) INFORMATION FOR SEQ ID NO: 98: 


(i) SEQUENCE CHARACTERISTICS: 



(vi) 

(xi) 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA5 

SEQUENCE DESCRIPTION: SEQ ID 

NO: 98 : 



Val 

Pro 

Tyr 

Arg 

Asn 

5 

Ala 

Ser 

Gly 

Val 

Tyr 

10 

His 

Val 

Thr 

Asn 

Asp 

15 

Cys 

Pro 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Ala 

25 

Asp 

Asn 

Leu 

lie 

Leu 

30 

His 

Ala 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Lys 

40 

Glu 

Gly 

Asn 

Val 

Ser 

45 

Arg 

Cys 

Trp 

Val 

Gin 

50 

lie 

Thr 

Pro 

Thr 

Leu 

55 

Ser 

Ala 

Pro 

Asn 

Leu 

60 

Gly 

Ala 

Val 

Thr 

Ala 

65 

Pro 

Leu 

Arg 

Arg 

Val 

70 

Val 

Asp 

Tyr 

Leu 

Ala 

75 

Gly 

Gly 

Ala 

Ala 

Leu 

80 

Cys 

Ser 

Ala 

Leu 

Tyr 

85 

Val 

Gly 

Asp 

Ala 

Cys 

90 

Gly 

Ala 

Val 

Phe 

Leu 

95 

Val 

Gly 

Gin 

Met 

Phe 

100 

Thr 

Tyr 

Arg 

Pro 

Arg 

105 

Gin 

His 

Thr 

Thr 

Val 

110 

Gin 

Asp 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Ser 

Gly 

120 

His 

lie 

Thr 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 

Ser 

Pro 

Thr 

Thr 

Ala 

140 

Leu 

Val 

Met 

Ala 

Gin 

145 

Val 

Leu 

Arg 

lie 

Pro 

150 

Gin 

Val 

Val 

lie 

Asp 

155 

lie 

lie 

Ala 

Gly 

Gly 

160 

His 

Trp 

Gly Val 

Leu 

165 


35 
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5 


10 


15 


20 


25 


30 


35 


Phe Ala Val Ala Tyr Phe Ala Ser Ala Ala Asn Trp Ala Lys Val 
170 175 180 

Val Leu Val Leu Phe Leu Phe Ala Gly Val Asp Gly 
185 190 

(2) INFORMATION FOR SEQ ID NO: 99: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS: unknown 

<D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA6 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 99: 


Val 

Pro 

Tyr 

Arg 

Asn 

5 

Ala 

Ser 

Gly 

Val 

Tyr 

10 

His 

Val 

Thr 

Asn 

Asp 

15 

Cys 

Pro 

Asn 

Ser 

Ser 

20 

lie 

Val 

Tyr 

Glu 

Ala 

25 

Asp 

Asp 

Leu 

lie 

Leu 

30 

His 

Ala 

Pro 

Gly 

Cys 

35 

Val 

Pro 

Cys 

Val 

Arg 

40 

Lys 

Asp 

Asn 

Val 

Ser 

45 

Arg 

Cys 

Trp 

Val 

His 

50 

lie 

Thr 

Pro 

Thr 

Leu 

55 

Ser 

Ala 

Pro 

Ser 

Leu 

60 

Gly Ala 

Val 

Thr 

Ala 

65 

Pro 

Leu 

Arg 

Arg 

Ala 

70 

Val 

Asp 

Tyr 

Leu 

Ala 

75 

Gly 

Gly 

Ala 

Ala 

Leu 

80 

Cys 

Ser 

Ala 

Leu 

Tyr 

85 

Val 

Gly 

Asp 

Val 

Cys 

90 

Gly 

Ala 

Leu 

Phe 

Leu 

95 

Val 

Gly 

Gin 

Met 

Phe 

100 

Thr 

Tyr 

Arg 

Pro 

Arg 

105 

Gin 

His 

Ala 

Thr 

Val 

110 

Gin 

Asp 

Cys 

Asn 

Cys 

115 

Ser 

lie 

Tyr 

Ser 

Gly 

120 

His 

He 

Thr 

Gly 

His 

125 

Arg 

Met 

Ala 

Trp 

Asp 

130 

Met 

Met 

Met 

Asn 

Trp 

135 

Ser 

Pro 

Ala 

Thr 

Ala 

140 

Leu 

Val 

Met 

Ala 

Gin 

145 

Met 

Leu 

Arg 

lie 

Pro 

150 

Gin 

Val 

Val 

lie 

Asp 

155 

lie 

lie 

Ala 

Gly 

Gly 

160 

His 

Trp 

Gly Val 

Leu 

165 

Phe 

Val 

Ala 

Leu 

Ala 

Val 

Ala 

Leu 

Tyr 

170 

Phe 

185 

Phe 

Leu 

Ala 

Phe 

Ser 

Ala 

Ala 

Gly 

Ala 

175 

Val 

190 

Asn 

Asp 

Trp 

Ala 

Ala 

Lys 

Val 

180 


(2) INFORMATION FOR SEQ ID NO: 100: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 
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( D ) TOPOLOGY : unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA7 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 100: 


Val 

Pro 

Tyr 

Arg 

Asn 

Ala 

Ser 

Gly 

Val 

Tyr 

His 

Val 

Thr 

Asn 

Asp 



5 





10 





15 

Cys 

Pro 

Asn 

Ser 

Ser 

lie 

Val 

Tyr 

Glu 

Ala 

Asp 

Asn 

Leu 

lie 

Leu 




20 





25 





30 

His 

Ala 

Pro 

Gly 

Cys 

Val 

Pro 

Cys 

Val 

Arg 

Gin 

Asn 

Asn 

Val 

Ser 





35 





40 





45 

Arg 

Cys 

Trp 

Val 

Gin 

lie 

Thr 

Pro 

Thr 

Leu 

Ser 

Ala 

Pro 

Asn 

Leu 


50 





55 





60 

Gly 

Ala 

Val 

Thr 

Ala 

Pro 

Leu 

Arg 

Arg 

Ala 

Val 

Asp 

Tyr 

Leu 

Ala 




65 





70 





75 

Gly 

Gly 

Ala 

Ala 

Leu 

Cys 

Ser 

Ala 

Leu 

Tyr 

Val 

Gly Asp 

Ala 

Cys 



80 





85 





90 

Gly 

Ala 

Val 

Phe 

Leu 

Val 

Gly 

Gin 

Met 

Phe 

Ser 

Tyr 

Arg 

Pro 

Arg 




95 





100 





105 

Gin 

His 

Thr 

Thr 

Val 

Gin 

Asp 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Ser 

Gly 





110 




115 





120 

His 

He 

Thr 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 





125 





130 





135 

Ser 

Pro 

Thr 

Thr 

Ala 

Leu 

Val 

Met 

Ala 

Gin 

Leu 

Leu 

Arg 

lie 

Pro 





140 





145 





150 

Gin 

Val 

Val 

lie 

Asp 

lie 

lie 

Ala 

Gly 

Gly 

His 

Trp 

Gly 

Val 

Leu 





155 





160 





165 

Phe 

Ala 

Ala 

Ala 

Tyr 

Phe 

Ala 

Ser 

Ala 

Ala 

Asn 

Trp 

Ala 

Lys 

Val 





170 





175 





180 

Val 

Leu 

Val 

Leu 

Phe 

Leu 

Phe 

Ala 

Gly 

Val 

Asp 

Ala 








185 





190 







25 


30 


(2) INFORMATION FOR SEQ ID NO: 101: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA13 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 101: 

Val Pro Tyr Arg Asn Ala Ser Gly Val Tyr His Val Thr Asn Asp 

5 10 15 


35 
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Cys 

Pro 

Asn 

Ser 

Ser 

lie 

Val 

Tyr 

Glu 

Ala 

Asp 

Asp 

Leu 

lie 

Leu 




20 





25 





30 

His 

Ala 

Pro 

Gly 

Cys 

Val 

Pro 

Cys 

Val 

Arg 

Gin 

Gly 

Asn 

Val 

Ser 





35 





40 





45 

Arg 

Cys 

Trp 

Val 

Gin 

lie 

Thr 

Pro 

Thr 

Leu 

Ser 

Ala 

Pro 

Ser 

Leu 


50 





55 





60 

Gly 

Ala 

Val 

Thr 

Ala 

Pro 

Leu 

Arg 

Arg 

Ala 

Val 

Asp 

Tyr 

Leu 

Ala 




65 





70 





75 

Gly Gly 

Ala 

Ala 

Leu 

Cys 

Ser 

Ala 

Leu 

Tyr 

Val 

Gly Asp 

Ala 

Cys 





80 





85 





90 

Gly Ala 

Val 

Phe 

Leu 

Val 

Gly 

Gin 

Met 

Phe 

Thr 

Tyr 

Ser 

Pro 

Arg 





95 





100 





105 

Arg 

His 

Asn 

Val 

Val 

Gin 

Asp 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Ser 

Gly 




110 





115 





120 

His 

lie 

Thr 

Gly 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 




125 





130 





135 

Ser 

Pro 

Thr 

Thr 

Ala 

Leu 

Val 

Met 

Ala 

Gin 

Leu 

Leu 

Arg 

lie 

Pro 





140 





145 





150 

Gin 

Val 

Val 

He 

Asp 

lie 

lie 

Ala 

Gly 

Ala 

His 

Trp 

Gly 

Val 

Leu 





155 





160 





165 

Phe 

Ala 

Ala 

Ala 

Tyr 

Tyr 

Ala 

Ser 

Ala 

Ala 

Asn 

Trp 

Ala 

Lys 

Val 





170 





175 





180 

Val 

Leu 

Val 

Leu 

Phe 

Leu 

Phe 

Ala 

Gly 

Val 

Asp 

Ala 





185 190 


20 


25 


30 


35 


(2) INFORMATION FOR SEQ ID NO: 102: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 192 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: HK2 



(xi) 

SEQUENCE 

DESCRIPTION: 

: SEQ ID 

NO: 102 : 



Leu 

Thr 

Tyr 

Gin 

Asn 

1 

Ser 

Ser 

Gin 

Leu 

Tyr 

10 

His 

Leu 

Thr 

Asn 

Asp 

15 

Cys 

Pro 

Asn 

Ser 

Ser 

20 

lie 

Val 

Leu 

Glu 

Ala 

25 

Asp 

Ala 

Met 

lie 

Leu 

30 

His 

Leu 

Pro 

Gin 

Cys 

35 

Leu 

Pro 

Cys 

Val 

Arg 

40 

Val 

Asp 

Asp 

Arg 

Ser 

45 

Thr 

Cys 

Trp 

His 

Ala 

50 

Val 

Thr 

Pro 

Thr 

Leu 

55 

Ala 

lie 

Pro 

Asn 

Ala 

60 

Ser 

Thr 

Pro 

Ala 

Thr 

65 

Gin 

Phe 

Arg 

Arg 

His 

70 

Val 

Asp 

Leu 

Leu 

Ala 

75 

Gin 

Ala 

Ala 

Val 

Val 

Cys 

Ser 

Ser 

Leu 

Tyr 

lie 

Gin 

Asp 

Leu 

Cys 


80 85 90 
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Gin 

Ser 

Leu 

Phe 

Leu 

Ala 

Gin 

Gin 

Leu 

Phe 

Thr 

Phe 

Gin 

Pro 

Arg 





95 





100 





105 

Arg 

His 

Trp 

Thr 

Val 

Gin 

Asp 

Cys 

Asn 

Cys 

Ser 

lie 

Tyr 

Thr 

Gin 




110 





115 





120 

His 

Val 

Thr 

Gin 

His 

Arg 

Met 

Ala 

Trp 

Asp 

Met 

Met 

Met 

Asn 

Trp 





125 





130 





135 

Ser 

Pro 

Thr 

Thr 

Thr 

Leu 

Val 

Leu 

Ser 

Ser 

lie 

Leu 

Arg 

Val 

Pro 





140 





145 





150 

Glu 

lie 

Cys 

Ala 

Ser 

Val 

He 

Phe 

Gin 

Gin 

His 

Trp 

Gin 

lie 

Leu 




155 





160 





165 

Leu 

Ala 

Val 

Ala 

Tyr 

Phe 

Gin 

Met 

Ala 

Gin 

Asn 

Trp 

Leu 

Lys 

Val 





170 





175 





180 

Leu 

Ala 

Val 

Leu 

Phe 

Leu 

Phe 

Ala 

Gin 

Val 

Glu 

Ala 








185 





190 







(2) INFORMATION FOR SEQ ID NO: 103: 


15 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK7 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 103: 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

CGT 

39 

AAC 

ACC 

AAC 

CGT 

CGC 

CCA 

CAG 

GAC 

GTC 

AAG 

TTC 

CCG 

GGT 

78 

GGC 

GGT 

CAG 

ATC 

GTT 

GGT 

GGA 

GTT 

TAC 

TTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCT 

AGA 

TTG 

GGT 

GTG 

CGC 

GCG 

CCG 

AGG 

AAG 

ACT 

156 

TCC 

GAG 

CGG 

TCG 

CAA 

CCT 

CGA 

GGT 

AGA 

CGT 

CAG 

CCT 

ATC 

195 

CCC 

AAG 

GCA 

CGT 

CGG 

CCC 

GAG 

GGC 

AGG 

ACC 

TGG 

GCT 

CAG 

234 

CCC 

GGG 

TAC 

CCT 

TGG 

CCC 

CTC 

TAT 

GGC 

AAT 

GAG 

GGC 

TGC 

273 

GGG 

TGG 

GCG 

GGA 

TGG 

CTC 

CTG 

TCT 

CCC 

CGT 

GGC 

TCT 

CGG 

312 

CCT 

AGC 

TGG 

GGC 

CCC 

ACA 

GAC 

CCC 

CGG 

CGC 

AGG 

TCG 

CGC 

351 

AAT 

TTG 

GGT 

AAA 

GTC 

ATC 

GAT 

ACC 

CTT 

ACG 

TGC 

GGC 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATA 

CCG 

CTC 

GTC 

GGC 

GCC 

CCT 

429 

CTT 

GGA 

GGC 

GCT 

GCC 

AGG 

GCC 

CTG 

GCG 

CAT 

GGC 

GTC 

CGG 

468 

GTT 

CTG 

GAA 

GAC 

GGC 

GTG 

AAC 

TAT 

GCA 

ACA 

GGG 

AAC 

CTT 

507 

CCT 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTC 

CTT 

TTG 

GCC 

CTG 

CTC 

546 

TCT 

TGC 

CTG 

ACC 

GTG 

CCC 

GCT 

TCG 

GCC 





573 


(2) INFORMATION FOR SEQ ID NO: 104: 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 


372577 _1 



153 


(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: US11 




(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 104: 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

CGT 

39 

AAC 

ACC 

AAC 

CGT 

CGC 

CCA 

CAG 

GAC 

GTC 

AAG 

TTC 

CCG 

GGT 

78 

GGC 

GGT 

CAG 

ATC 

GTT 

GGT 

GGA 

GTT 

TAC 

TTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCT 

AGA 

TTG 

GGT 

GTG 

CGC 

GCG 

ACG 

AGG 

AAG 

ACT 

156 

TCC 

GAG 

CGG 

TCG 

CAA 

CCT 

CGA 

GGT 

AGA 

CGT 

CAG 

CCT 

ATC 

195 

CCC 

AAG 

GCA 

CGT 

CGG 

CCC 

GAG 

GGC 

AGG 

ACC 

TGG 

GCT 

CAG 

234 

CCC 

GGG 

TAC 

CCT 

TGG 

CCC 

CTC 

TAT 

GGC 

AAT 

GAG 

GGC 

TGC 

273 

GGG 

TGG 

GCG 

GGA 

TGG 

CTC 

CTG 

TCT 

CCC 

CGT 

GGC 

TCT 

CGG 

312 

CCT 

AGC 

TGG 

GGC 

CCC 

ACG 

GAC 

CCC 

CGG 

CGT 

AGG 

TCG 

CGC 

351 

AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTT 

ACG 

TGC 

GGC 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATA 

CCG 

CTC 

GTC 

GGC 

GCC 

CCT 

429 

CTC 

GGA 

GGC 

GCT 

GCC 

AGG 

GCC 

CTG 

GCG 

CAT 

GGC 

GTC 

CGG 

468 

GTT 

CTG 

GAA 

GAC 

GGC 

GTG 

AAC 

TAT 

GCA 

ACA 

GGG 

AAC 

CTT 

507 

CCT 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTC 

CTT 

CTG 

GCC 

CTG 

CTC 

546 

TCT 

TGC 

CTG 

ACT 

GTG 

CCC 

GCT 

TCA 

GCC 





573 


(2) 

INFORMATION FOR SEQ ID NO: 

105 : 







(i) 


SEQUENCE 

, CHARACTERISTICS: 








(A) 

LENGTH : 

573 base pairs 







(B) 

TYPE: 

nucleic 

acid 








(C) 

STRANDEDNESS : 

single 








(D) 

TOPOLOGY : 

linear 







(vi) 


ORIGINAI 

, SOURCE: 











(A) 

ORGANISM: 

homosapiens 








(C) 

INDIVIDUAL 

ISOLATE: 

S14 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 105: 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

CGT 

39 

AAC 

ACC 

AAC 

CGT 

CGC 

CCA 

CAG 

GAC 

GTC 

AAG 

TTC 

CCG 

GGT 

78 

GGC 

GGT 

CAG 

ATC 

GTT 

GGT 

GGA 

GTT 

TAC 

TTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCT 

AGA 

TTG 

GGT 

GTG 

CGC 

GCG 

ACG 

AGG 

AAG 

ACT 

156 

TCC 

GAG 

CGG 

TCG 

CAA 

CCT 

CGA 

GGT 

AGA 

CGT 

CAG 

CCT 

ATC 

195 

CCC 

AAG 

GCA 

CGT 

CGG 

CCC 

GAG 

GGC 

AGG 

ACC 

TGG 

GCT 

CAG 

234 

CCC 

GGG 

TAT 

CCT 

TGG 

CCC 

CTC 

TAT 

GGC 

AAT 

GAG 

GGC 

TGC 

273 

GGG 

TGG 

GCG 

GGA 

TGG 

CTC 

CTG 

TCT 

CCC 

CGT 

GGC 

TCT 

CGG 

312 

CCT 

AGC 

TGG 

GGC 

CCC 

ACA 

GAC 

CCC 

CGG 

CGT 

AGG 

TCG 

CGC 

351 

AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTC 

ACG 

TGC 

GGC 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATA 

CCG 

CTC 

GTC 

GGC 

GCC 

CCC 

429 

CTC 

GGG 

GGC 

GCT 

GCC 

AGG 

GCC 

CTG 

GCG 

CAT 

GGC 

GTC 

CGG 

468 


372577_ 



GTT 

CCT 

TCT 


154 


507 

546 

573 


CTG GAA 
GGT TGC 
TGC CTG 


GAC GGC 
TCT TTC 
ACT GTG 


GTG AAC 
TCT ATC 
CCC GCT 


TAT GCA 
TTC CTC 
TCA GCC 


ACA GGG 
CTA GCC 


AAC CTT 
CTG CTT 


5 


10 


15 


20 


25 


30 


(2) INFORMATION FOR SEQ ID NO: 106: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SW1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 106: 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

CGT 

AAC 

ACC 

AAC 

CGT 

CGC 

CCA 

CAG 

GAC 

GTC 

AAG 

TTC 

CCG 

GGT 

GGC 

GGT 

CAG 

ATC 

GTT 

GGT 

GGA 

GTT 

TAC 

TTG 

TTG 

CCG 

CGC 

AGG 

GGC 

CCT 

AGA 

TTG 

GGT 

GTG 

CGC 

GCG 

ACG 

AGG 

AAG 

ACT 

TCC 

GAG 

CGG 

TCG 

CAA 

CCT 

CGA 

GGT 

AGA 

CGT 

CAG 

CCT 

ATC 

CCC 

AAG 

GCG 

CGT 

CGG 

CCC 

GAG 

GGC 

AGG 

ACC 

TGG 

GCT 

CAG 

CCC 

GGG 

TAT 

CCT 

TGG 

CCC 

CTC 

TAT 

GGC 

AAT 

GAG 

GGC 

TGC 

GGA 

TGG 

GCG 

GGA 

TGG 

CTC 

CTG 

TCC 

CCC 

CGT 

GGC 

TCT 

CGG 

CCT 

AGC 

TGG 

GGC 

CCT 

ACA 

GAC 

CCC 

CGG 

CGT 

AGG 

TCG 

CGC 

AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTC 

ACG 

TGC 

GGC 

TTC 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATT 

CCG 

CTC 

GTC 

GGC 

GCC 

CCT 

CTT 

GGA 

GGC 

GCT 

GCC 

AGG 

GCC 

CTG 

GCG 

CAT 

GGC 

GTC 

CGG 

GTT 

CTG 

GAA 

GAC 

GGC 

GTG 

AAC 

TAT 

GCA 

ACA 

GGG 

AAC 

CTT 

CCT 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTC 

CTT 

CTG 

GCC 

CTG 

CTT 

TCT 

TGC 

CTG 

ACA 

GTG 

CCC 

GCG 

TCA 

GCC 






(2) INFORMATION FOR SEQ ID NO: 107: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 107: 


ATG AGC ACA AAT CCT AAA CCT CAA AGA AAA ACC AAA CGT 
AAC ACC AAC CGT CGC CCA CAG GAC GTT AAG TTC CCG GGT 


39 

78 

117 

156 

195 

234 

273 

312 

351 

390 

429 

468 

507 

546 

573 


39 

78 


372577_ 



155 


GGC 

GGT 

CAG 

ATC 

GTT 

GGT 

GGA 

GTT 

TAC 

TTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCT 

AGA 

TTG 

GGT 

GTG 

CGC 

GCG 

ACG 

AGG 

AAG 

ACT 

156 

TCC 

GAG 

CGG 

TCG 

CAA 

CCT 

CGC 

GGT 

AGA 

CGT 

CAG 

CCT 

ATC 

195 

CCC 

AAG 

GCG 

CGT 

CGG 

CCC 

GAG 

GGC 

AGG 

ACC 

TGG 

GCT 

CAG 

234 

CCC 

GGG 

TAC 

CCT 

TGG 

CCC 

CTC 

TAT 

GGC 

AAT 

GAG 

GGC 

TGC 

273 

GGG 

TGG 

GCG 

GGA 

TGG 

CTC 

CTG 

TCC 

CCC 

CGT 

GGC 

TCC 

CGG 

312 

CCT 

AGC 

TGG 

GGC 

CCT 

ACA 

GAC 

CCC 

CGG 

CGT 

AGG 

TCG 

CGC 

351 

AAT 

TTG 

GGC 

AAA 

GTC 

ATC 

GAT 

ACC 

CTC 

ACG 

TGC 

GGC 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATT 

CCG 

CTC 

GTC 

GGC 

GCC 

CCT 

429 

CTC 

GGA 

GGC 

GCT 

GCC 

AGG 

GCC 

CTG 

GCG 

CAT 

GGC 

GTC 

CGG 

468 

GTT 

CTG 

GAA 

GAC 

GGC 

GTG 

AAC 

TAT 

GCA 

ACA 

GGG 

AAC 

CTT 

507 

CCT 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTC 

CTT 

CTG 

GCC 

CTG 

CTC 

546 

TCT 

TGT 

CTG 

ACT 

GTG 

CCC 

GCG 

TCA 

GCT 





573 

(2) 

INFORMATION FOR SEQ ID NO 

: 108 : 






SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DR4 

SEQUENCE DESCRIPTION: SEQ ID NO: 108: 


30 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

CGT 

39 

AAC 

ACC 

AAC 

CGT 

CGC 

CCA 

CAG 

GAC 

GTC 

AAG 

TTC 

CCG 

GGT 

78 

GGC 

GGT 

CAG 

ATC 

GTT 

GGT 

GGA 

GTT 

TAC 

TTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCT 

AGA 

TTG 

GGT 

GTG 

CGC 

GCG 

ACG 

AGG 

AAG 

ACT 

156 

TCC 

GAG 

CGG 

TCG 

CAA 

CCT 

CGA 

GGT 

AGA 

CGT 

CAG 

CCT 

ATC 

195 

CCC 

AAG 

GCG 

CGT 

CGG 

CCC 

GAG 

GGC 

AGG 

ACC 

TGG 

GCT 

CAG 

234 

CCC 

GGG 

TAC 

CCT 

TGG 

CCC 

CTC 

TAT 

GGC 

AAT 

GAG 

GGC 

TGC 

273 

GGG 

TGG 

GCG 

GGA 

TGG 

CTC 

CTG 

TCC 

CCC 

CGT 

GGC 

TCT 

CGG 

312 

CCT 

AGC 

TGG 

GGC 

CCC 

ACA 

GAC 

CCC 

CGG 

CGT 

AGG 

TCG 

CGC 

351 

AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAC 

ACC 

CTC 

ACG 

TGC 

GGC 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATC 

CCG 

CTC 

GTC 

GGC 

GCC 

CCC 

429 

CTT 

GGG 

GGC 

GCT 

GCC 

AGG 

GCC 

CTG 

GCG 

CAT 

GGC 

GTC 

CGA 

468 

GTT 

CTG 

GAA 

GAC 

GGC 

GTG 

AAC 

TAT 

GCA 

ACA 

GGG 

AAT 

CTT 

507 

CCT 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTC 

CTT 

TTG 

GCT 

TTG 

CTC 

546 

TCT 

TGC 

TTG 

ACC 

GTG 

CCC 

GCA 

TCG 

GCC 





573 

(2) 

INFORMATION FOR SEQ ID NO 

: 109 : 






35 


SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 


372577_ 



156 


(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA10 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 109: 


ATG AGC ACG AAT CCT AAA CCT CAA AGA AAA ACC AAA CGT 39 

AAC ACC AAC CGC CGC CCA CAG GAC GTC AAG TTC CCG GGC 78 

GGT GGT CAG ATC GTT GGT GGA GTC TAT CTG TTG CCG CGC 117 

AGG GGC CCC AGG TTG GGT GTG CGC GCG ACG AGG AAG ACT 156 

TCC GAG CGG TCG CAA CCT CGT GGA AGG CGA CAA CCT ATC 195 

CCC AAG GCT CGC CAG CCC GAG GGC AGG ACC TGG GCC CAG 234 

10 CCC GGG TAC CCT TGG CCC CTC TAT GGC AAT GAG GGC TTG 273 

GGG TGG GCA GGA TGG CTC CTG TCA CCC CGT GGC TCT CGG 312 

CCT AGT TGG GGC CCC ACG GAC CCC CGG CGT AGG TCG CGT 351 

AAT TTG GGT AAG GTC ATC GAT ACC CTC ACA TGC GGC TTC 390 

GCC GAC CTC ATG GGG TAC ATT CCG CTC GTC GGC GCC CCT 429 

TTA GGG GGC GCT GCC AGG GCC TTG GCG CAT GGC GTC CGG 468 

GTT CTG GAA GAC GGC GTG AAC TAT GCA ACA GGG AAT TTG 507 

CCC GGT TGC CCT TTC TCT ATC TTC CTC TTG GCT TTG CTG 546 

15 TCC TGT TTA ACC ATC CCA GCT TCC GCT 573 


(2) INFORMATION FOR SEQ ID NO: 110: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S45 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 110: 


ATG AGC ACG AAT CCT AAA CCT CAA AGA CAA ACC AAA CGT 39 

AAC ACC AAC CGC CGC CCA CAG GAC GTC AAG TTC CCG GGT 78 

GGC GGT CAG ATC GTT GGT GGA GTT TAC CTG TTG CCG CGC 117 

AGG GGC CCC AGG TTG GGT GTG CGC GCG ACT AGG AAG ACT 156 

TCC GAG CGG TCA CAA CCT CGT GGA CGG CGA CAA CCT ATC 195 

~ n CCC AAG GCT CGC CGG CCC GAG GGC AGG GCC TGG GCC CAG 234 

JU CCC GGG CAT CCT TGG CCC CTC TAT GGC AAT GAG GGC TTG 273 

GGG TGG GCA GGA TGG CTC CTG TCA CCC CGT GGC TCC CGG 312 

CCT AGT TGG GGC CCC ACG GAC CCC CGG CGT AGG TCG CGC 351 

AAT TTG GGT AAG GTC ATC GAT ACC CTC ACG TGC GGC TTC 390 

GCC GAC CTC ATG GGG TAC ATT CCG CTC GTC GGC GCC CCC 429 

CTA GGG GGC GCT GCC AGA GCC TTG GCG CAT GGC GTC CGG 468 

GTT CTG GAG GAC GGC GTG AAC TAT GCA ACA GGG AAT CTG 507 

35 


372577_ 



157 


CCC GGT TGC TCT TTC TCT ATC TTC CTC TTG GCT CTG CTG 
TCC TGC TTG ACC ATC CCA GCT TCC GCT 


(2) INFORMATION FOR SEQ ID NO : 111: 


5 


10 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: D1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 111: 


15 


20 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

CGT 

AAC 

ACC 

AAC 

CGC 

CGC 

CCA 

CAG 

GAC 

GTC 

AAG 

TTC 

CCG 

GGC 

GGT 

GGT 

CAG 

ATC 

GTT 

GGT 

GGA 

GTT 

TAC 

CTG 

TTG 

CCG 

CGC 

AGG 

GGC 

CCC 

AGG 

TTG 

GGT 

GTG 

CGC 

GCG 

ACT 

AGG 

AAG 

ACT 

TCC 

GAG 

CGG 

TCG 

CAA 

CCT 

CGT 

GGA 

AGG 

CGA 

CAA 

CCT 

ATC 

CCC 

AAG 

GCT 

CGC 

CGG 

CCC 

GAG 

GGT 

AGG 

GCC 

TGG 

GCT 

CAG 

CCC 

GGG 

TAC 

CCT 

TGG 

CCC 

CTC 

TAT 

GGC 

AAC 

GAG 

GGC 

TTG 

GGG 

TGG 

GCA 

GGA 

TGG 

CTC 

CTG 

TCA 

CCC 

CGC 

GGC 

TCC 

CGG 

CCT 

AGT 

TGG 

GGC 

CCC 

ACC 

GAC 

CCC 

CGG 

CGT 

AGG 

TCG 

CGT 

AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTC 

ACA 

TGC 

GGC 

TTC 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATC 

CCG 

CTC 

GTC 

GGC 

GCC 

CCC 

CTA 

GGG 

GGT 

GCT 

GCC 

AGG 

GCC 

CTG 

GCG 

CAT 

GGC 

GTC 

CGG 

GTT 

CTG 

GAG 

GAC 

GGC 

GTG 

AAT 

TAT 

GCA 

ACA 

GGG 

AAT 

TTG 

CCC 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTC 

CTC 

TTG 

GCT 

TTG 

CTG 

TCC 

TGT 

TTG 

ACC 

ATC 

CCA 

GCT 

TCC 

GCT 






(2) INFORMATION FOR SEQ ID NO: 112: 


25 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: US6 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 112: 


ATG AGC ACG AAT CCT AAA 
AAC ACC AAC CGC CGC CCA 
GGT GGT CAG ATC GTT GGT 
AGG GGC CCC AGG TTG GGT 


CCT CAA AGA AAA ACC AAA CGT 
CAG GAC GTC AAG TTC CCG GGC 
GGA GTT TAC CTG TTG CCG CGC 
GTG CGC GCG ACT AGG AAG ACT 


546 

573 


39 

78 

117 

156 

195 

234 

273 

312 

351 

390 

429 

468 

507 

546 

573 


39 

78 

117 

156 


372577_ 



158 


5 


10 


TCC 

GAG 

CGG 

TCG 

CAA 

CCT 

CGT 

GGA 

AGG 

CGA 

CAA 

CCT 

ATC 

195 

ccc 

AAG 

GCT 

CGC 

CGG 

CCC 

GAG 

GGC 

AGG 

GCC 

TGG 

GCT 

CAG 

234 

ccc 

GGG 

TAC 

CCT 

TGG 

ccc 

CTC 

TAT 

GGC 

AAC 

GAG 

GGC 

ATG 

273 

GGG 

TGG 

GCA 

GGA 

TGG 

CTC 

CTG 

TCA 

CCC 

CGT 

GGC 

TCC 

CGG 

312 

CCT 

AGT 

TGG 

GGC 

CCC 

ACG 

GAC 

CCC 

CGG 

CGT 

AGG 

TCG 

CGT 

351 

AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTC 

ACA 

TGC 

GGC 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATT 

CCG 

CTC 

GTC 

GGC 

GCC 

CCC 

429 

CTA 

GGG 

GGC 

GCT 

GCC 

AGG 

GCC 

TTG 

GCG 

CAT 

GGC 

GTC 

CGG 

468 

GTT 

CTG 

GAG 

GAC 

GGC 

GTG 

AAC 

TAT 

GCA 

ACA 

GGG 

AAC 

TTG 

507 

CCC 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTC 

CTC 

TTG 

GCT 

TTG 

CTG 

546 

TCC 

TGT 

TTG 

ACC 

ATT 

CCA 

GCT 

TCC 

GCT 





573 


(2) INFORMATION FOR SEQ ID NO: 113: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


15 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: P10 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 113: 


20 


25 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

CGT 

39 

AAC 

ACC 

AAC 

CGC 

CGC 

CCA 

CAG 

GAC 

GTC 

AAG 

TTC 

CCG 

GGC 

78 

GGT 

GGT 

CAG 

ATC 

GTT 

GGT 

GGA 

GTT 

TAC 

CTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCC 

AGG 

TTG 

GGT 

GTG 

CGC 

GCG 

ACT 

AGG 

AAG 

ACT 

156 

TCC 

GAG 

CGG 

TCG 

CAA 

CCT 

CGT 

GGA 

AGG 

CGA 

CAA 

CCT 

ATC 

195 

CCC 

AAG 

GCT 

CGC 

CGG 

CCC 

GAG 

GGC 

AGG 

GCC 

TGG 

GCT 

CAG 

234 

CCC 

GGG 

TAC 

CCT 

TGG 

CCC 

CTC 

TAT 

GGC 

AAT 

GAG 

GGC 

TTG 

273 

GGG 

TGG 

GCA 

GGA 

TGG 

CTC 

CTG 

TCA 

CCC 

CGT 

GGC 

TCT 

CGG 

312 

CCT 

AGT 

TGG 

GGC 

CCC 

ACG 

GAC 

CCC 

CGG 

CGT 

AGG 

TCG 

CGT 

351 

AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTC 

ACA 

TGC 

GGC 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATT 

CCG 

CTC 

GTC 

GGC 

GCC 

CCC 

429 

CTA 

GGG 

GGC 

GCT 

GCC 

AGG 

GCC 

CTG 

GCG 

CAT 

GGC 

GTC 

CGG 

468 

GTT 

CTG 

GAG 

GAC 

GGC 

GTG 

AAC 

TAT 

GCA 

ACA 

GGG 

AAT 

CTG 

507 

CCC 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTC 

CTC 

TTG 

GCT 

TTG 

CTG 

546 

TCC 

TGC 

CTG 

ACC 

ATC 

CCA 

GCG 

TCC 

GCT 





573 


(2) INFORMATION FOR SEQ ID NO: 114: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK1 


5 


10 


15 


20 


25 


30 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 114: 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

CGT 

39 

AAC 

ACC 

AAC 

CGC 

CGC 

CCA 

CAG 

GAC 

GTC 

AAG 

TTC 

CCG 

GGC 

78 

GGT 

GGT 

CAG 

ATC 

GTT 

GGT 

GGA 

GTT 

TAC 

CTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCC 

AGG 

TTG 

GGT 

GTG 

CGC 

GCG 

ACT 

AGG 

AAG 

ACT 

156 

TCC 

GAG 

CGG 

TCG 

CAA 

CCT 

CGT 

GGA 

AGG 

CGA 

CAA 

CCT 

ATC 

195 

CCC 

AAG 

GCT 

CGC 

CGG 

CCC 

GAG 

GGC 

AGG 

GCC 

TGG 

GCT 

CAG 

234 

CCC 

GGG 

TAC 

CCT 

TGG 

CCC 

CTC 

TAT 

GGC 

AAT 

GAG 

GGC 

ATG 

273 

GGG 

TGG 

GCA 

GGA 

TGG 

CTC 

CTG 

TCA 

CCC 

CGC 

GGC 

TCT 

CGG 

312 

CCT 

AGT 

TGG 

GGC 

CCC 

AAC 

GAC 

CCC 

CGG 

CGT 

AGG 

TCG 

CGT 

351 

AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTC 

ACA 

TGC 

GGC 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATT 

CCG 

CTC 

GTC 

GGC 

GCC 

CCC 

429 

CTA 

GGG 

GGC 

GCT 

GCC 

AGG 

GCC 

CTG 

GCG 

CAT 

GGC 

GTC 

CGG 

468 

GTT 

CTG 

GAG 

GAC 

GGC 

GTG 

AAC 

TAC 

GCA 

ACA 

GGG 

AAT 

TTG 

507 

CCC 

TCC 

(2) 

GGT TGC TCT TTC TCT ATC TTC CTC TTG GCT CTG TTG 
TGT TTG ACC ATC CCA GCT TCC GCC 

INFORMATION FOR SEQ ID NO: 115: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T10 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 115: 

546 

573 

ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

CGT 

39 

AAC 

ACC 

AAC 

CGC 

CGC 

CCA 

CAG 

GAC 

GTC 

AAG 

TTC 

CCG 

GGC 

78 

GGT 

GGT 

CAG 

ATC 

GTT 

GGT 

GGA 

GTT 

TAC 

CTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCC 

AGG 

TTG 

GGT 

GTG 

CGC 

GCG 

ACT 

AGG 

AAG 

ACT 

156 

TCC 

GAG 

CGG 

TCG 

CAA 

CCT 

CGT 

GGA 

AGG 

CGA 

CAG 

CCT 

ATC 

195 

CCC 

AAG 

GCT 

CGC 

CAG 

CCC 

GAG 

GGC 

AGG 

GCC 

TGG 

GCT 

CAG 

234 

CCC 

GGG 

TAC 

CCT 

TGG 

CCC 

CTC 

TAT 

GGC 

AAT 

GAG 

GGC 

ATG 

273 

GGG 

TGG 

GCA 

GGA 

TGG 

CTC 

CTG 

TCA 

CCC 

CGT 

GGC 

TCC 

CGG 

312 

CCT 

AGT 

TGG 

GGC 

CCC 

ACA 

GAC 

CCC 

CGG 

CGT 

AGG 

TCG 

CGT 

351 

AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTC 

ACA 

TGC 

GGC 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATT 

CCG 

CTC 

GTC 

GGC 

GCC 

CCC 

429 

CTA 

GGG 

GGC 

GCT 

GCC 

AGG 

GCT 

CTG 

GCA 

CAT 

GGT 

GTC 

CGG 

468 

GTT 

CTG 

GAG 

GAC 

GGC 

GTG 

AAC 

TAT 

GCA 

ACA 

GGG 

AAT 

TTG 

507 

CCC 

TCT 

GGT 

TGT 

TGC 

CTG 

TCT 

ACC 

TTT 

ATC 

TCT 

CCA 

ATC 

GCT 

TTC 

TCC 

CTC 

GCT 

TTG 

GCT 

CTG 

CTG 

546 

573 
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(2) INFORMATION FOR SEQ ID NO : 116: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SW2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 116: 


ATG AGC ACG AAT CCT AAA CCT CAA AGA AAA ACC AAA CGT 39 

AAC ACC AAC CGC CGC CCA CAG GAC GTC AAG TTC CCG GGC 78 

GGT GGC CAG ATC GTT GGT GGA GTT TAC CTG TTG CCG CGC 117 

AGG GGC CCC CGG TTG GGT GTG CGC GCG ACT AGG AAG ACT 156 

TCC GAG CGG TCG CAA CCT CGT GGA AGG CGA CAA CCT ATC 195 

CCC AAG GCT CGC CAG CCC GAG GGC AGG GCC TGG GCT CAG 234 

CCT GGG TAC CCC TGG CCC CTC TAT GGC AAT GAG GGC ATG 273 

15 GGA TGG GCA GGA TGG CTC CTG TCC CCC CGC GGC TCT CGG 312 

CCT AGT TGG GGC CCC ACT GAC CCC CGG CGT AGG TCG CGT 351 

AAT TTG GGT AAG GTC ATC GAT ACC CTC ACA TGC GGC TTC 390 

GCC GAC CTC ATG GGG TAC ATT CCG CTC GTC GGC GCC CCC 429 

CTA GGG GGC GCT GCC AGG GCC CTG GCG CAT GGC GTC CGG 468 

GTC CTG GAG GAC GGC GTG AAC TAT GCA ACA GGG AAT CTG 507 

CCC GGT TGC TCC TTT TCT ATC TTC CTC TTG GCT TTG CTG 546 

90 TCC TGT CTG ACC ATC CCA GCT TCC GCT 573 


(2) INFORMATION FOR SEQ ID NO: 117: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

25 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: IND3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 117: 


ATG AGC ACG AAT CCT AAA CCT CAA AGA AAA ACC AAA CGT 39 

AAC ACC AAC CGC CGC CCA CAG GAC GTC AAG TTC CCG GGC 78 

GGT GGC CAG ATC GTT GGT GGA GTT TAC CTG TTG CCG CGC 117 

AGG GGC CCC AGG TTG GGT GTG CGC GCG ACT AGG AAG ACT 156 

TCC GAG CGG TCG CAA CCT CGT GGA AGG CGA CAA CCT ATC 195 

CCC AAG GCT CGC CGG CCC GAG GGT AGG GCC TGG GCT CAG 234 
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5 


CCC GGG TAC CCT TGG CCC CTC TAT GGC AAT GAG GGC TTG 
GGG TGG GCA GGA TGG CTC CTG TCA CCC CGC GGT TCT CGG 
CCT AGT TGG GGC CCC ACA GAC CCC CGG CGT AGG TCG CGT 
AAT TTG GGT AAA GTC ATC GAT ACC CTC ACA TGC GGC TTC 
GCC GAC CTC ATG GGG TAC ATC CCG CTC GTC GGC GCC CCC 
CTA GGG GGC GCT GCC AGG GCC CTG GCG CAT GGC GTC CGG 
GTC CTG GAG GAC GGC GTG AAC TAT GCA ACA GGG AAC TTG 
CCC GGT TGC TCT TTC TCT ATC TTC CTT TTA GCT TTG CTA 
TCC TGT TTG ACC ATC CCA GCT TCC GCT 


273 

312 

351 

390 

429 

468 

507 

546 

573 


(2) INFORMATION FOR SEQ ID NO : 118: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


15 


20 


25 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: IND8 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 118: 


ATG AGC ACG AAT CCT AAA CCT CAA AGA AAA ACC AAA CGT 39 
AAC ACC AAC CGC CGC CCA CAG GAC GTC AAG TTC CCG GGC 78 
GGT GGC CAG ATC GTT GGT GGA GTT TAC CTG TTG CCG CGC 117 
AGG GGC CCC AGG TTG GGT GTG CGC GCG ACT AGG AAG ACT 156 
TCC GAG CGG TCG CAA CCT CGT GGA AGG CGA CAA CCT ATC 195 
CCC AAG GCT CGC CGG CCC GAG GGT AGG GCC TGG GCT CAG 234 
CCC GGG CAC CCT TGG CCC CTC TAT GGC AAT GAG GGC TTG 273 
GGG TGG GCA GGA TGG CTC CTG TCA CCC CGC GGC TCT CGG 312 
CCT AGT TGG GGC CCC ACA GAC CCC CGG CGT AGG TCG CGT 351 
AAT TTG GGT AAG GTC ATC GAT ACC CTC ACA TGC GGC TTC 390 
GCC GAC CTC ATG GGG TAC ATC CCG CTC GTC GGC GCC CCC 429 
CTA GGG GGT GCT GCC AGG GCC CTG GCG CAT GGC GTC CGG 468 
GTC CTG GAG GAC GGC GTG AAC TAT GCA ACA GGG AAC TTG 507 
CCC GGT TGC TCT TTC TCT ATC TTC CTT TTG GCT TTG CTA 546 
TCC TGT TTG ACC GTC CCA GCT TCC GCT 573 


(2) INFORMATION FOR SEQ ID NO: 119: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S9 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 119: 


ATG AGC ACG AAT CCT AAA CCT CAA AGA AAA ACC AAA CGT 39 
AAC ACC AAC CGC CGC CCA CAG GAC GTT AAG TTC CCG GGC 78 
GGT GGT CAG ATC GTC GGT GGA GTT TAC CTG TTG CCG CGC 117 
AGG GGC CCC AGG TTG GGT GTG CGC GCA ACT AGG AAG ACT 156 
TCC GAG CGG TCG CAA CCT CGT GGA AGG CGA CAA CCT ATC 195 
CCC AAG GCT CGC CAT CCC GAG GGC AGG GCC TGG GCT CAG 234 
CCC GGG TAC CCT TGG CCC CTC TAC GGC AAT GAG GGC TTG 273 
GGG TGG GCA GGA TGG CTC CTG TCA CCC CGT GGC TCT CGG 312 
CCT AGT TGG GGC CCC AAT GAC CCC CGG CGT AGG TCG CGT 351 
AAT TTG GGT AAG GTC ATC GAT ACC CTC ACA TGC GGC TTT 390 
GCC GAC CTC ATG GGG TAC ATT CCG CTC GTC GGC GCC CCC 429 
CTA GGG GGC GCT GCC AGG GCT CTG GCG CAT GGC GTC CGG 468 
GTT CTG GAG GAC GGC GTG AAC TAT GCA ACA GGG AAC CTC 507 
CCC GGT TGC TCT TTC TCT ATC TTC CTT CTG GCT TTG CTG 546 
TCC TGT TTG ACC ATC CCA GCT TCC GCT 573 


(2) INFORMATION FOR SEQ ID NO : 120: 

15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


20 


25 


30 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: HK3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 120: 


ATG AGC ACG AAT CCT AAA CCT CAA AGA AAA ACC AAA CGT 39 
AAC ACC AAC CGC CGC CCA CAG GAC GTC AAG TTC CCG GGC 78 
GGT GGT CAG ATC GTT GGT GGA GTT TAC CTG TTG CCG CGC 117 
AGG GGC CCC AGG TTG GGT GTG CGC GCG ACC AGG AAG ACT 156 
TCA GAG CGG TCG CAA CCT CGT GGA AGG CGA CAA CCT ATC 195 
CCC AAG GCT CGC CAA CCC GAG GGC AGG ACC TGG GCT CAG 234 
CCC GGG TAT CCT TGG CCC CTC TAT GGC AAC GAG GGC ATG 273 
GGG TGG GCA GGA TGG CTC CTG TCA CCC CGC GGC TCT CGG 312 
CCT AAT TGG GGC CCC ACG GAC CCC CGG CGT AGG TCG CGC 351 
AAT TTG GGT AAG GTC ATC GAT ACC CTC ACG TGC GGC TTC 390 
GCC GAC CTC ATG GGG TAC ATC CCG CTC GTC GGT GCC CCC 429 
CTA GGG GGC GTT GCC AGA GCC TTG GCA CAT GGT GTC CGG 468 
GTT CTG GAG GAC GGC GTG AAC TAT GCA ACA GGG AAT TTA 507 
CCC GGT TGC TCT TTC TCT ATC TTC CTC TTG GCT TTG CTG 546 
TCC TGC TTG ACC ACC CCA GCT TCC GCT 573 


(2) INFORMATION FOR SEQ ID NO : 121: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: HK5 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 121: 


ATG AGO ACG AAT CCT AAA CCT CAA AGA AAA ACC AAA CGT 39 

10 AAC ACC AAC CGC CGC CCA CAG GAC GTC AAG TTC CCG GGC 78 

GGT GGT CAG ATC GTT GGT GGA GTT TAC CTG TTG CCG CGC 117 

AGG GGC CCC AGG TTG GGT GTG CGC GCG ACC AGG AAG ACT 156 

TCC GAG CGG TCG CAA CCT CGT GGA AGG CGA CAA CCT ATC 195 

CCC AAG GCT CGC CGA CCC GAG GGC AGG ACC TGG GCT CAG 234 

CCC GGG TAT CCT TGG CCC CTC TAT GGC AAT GAG GGC ATG 273 

GGG TGG GCA GGA TGG CTC CTG TCA CCC CAT GGC TCT CGG 312 

CCT AGT TGG GGC CCC ACG GAC CCC CGG CGT AGG TCG CGT 351 

15 AAT TTG GGT AAG GTC ATC GAT ACC CTC ACG TGC GGC TTC 390 

GCC GAC CTC ATG GGG TAC ATC CCG CTC GTC GGC GCC CCC 429 

CTA GGG GGC GTT GCC AGA GCC CTG GCA CAC GGT GTC CGG 468 

GTT CTG GAG GAC GGC GTG AAC TAC GCA ACA GGG AAT ATA 507 

CCC GGT TGC TCT TTC TCT ATC TTC CTT TTG GCT TTG CTG 546 

TCC TGT CTG ACC ACC CCA GTT TCC GCT 573 


20 


25 


(2) INFORMATION FOR SEQ ID NO: 122: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: HK4 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 122: 


ATG AGC ACG AAT CCT AAA CCT CAA AGA AAG ACC AAA CGT 39 

AAC ACC AAC CGC CGC CCA CAG GAC GTT AAG TTC CCG GGC 78 

GGT GGC CAG ATC GTC GGT GGA GTT TAC CTG TTG CCG CGC 117 

AGG GGC CCC AGG TTG GGT GTG CGC GCG ACT AGG AAG ACT 156 

TCC GAG CGG TCG CAA CCT CGT GGA AGG CGA CAA CCT ATC 195 

CCC AAG GCT CGC CAA CCC GAG GGC AGG ACC TGG GCT CAG 234 

CCC GGG TAC CCT TGG CCC CTC TAT GGC AAT GAG GGC ATG 273 

GGG TGG GCA GGA TGG CTC CTG TCA CCC CGC GGC TCT CGG 312 
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CCT 

AGT 

TGG 

GGC 

CCC 

ACG 

GAC 

CCC 

CGG 

CGT 

AGG 

TCG 

CGC 

351 

AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTC 

ACA 

TGC 

GGC 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATT 

CCG 

CTC 

GTC 

GGC 

GCC 

CCC 

429 

TTA 

GGG 

GGC 

GTT 

GCC 

AGA 

GCC 

CTG 

GCA 

CAT 

GGT 

GTC 

CGG 

468 

GTT 

GTG 

GAG 

GAC 

GGC 

GTG 

AAC 

TAT 

GCA 

ACA 

GGG 

AAT 

TTG 

507 

CCC 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTC 

CTC 

TTG 

GCT 

CTG 

CTG 

546 

TCC 

TGT 

TTG 

ACC 

ATC 

CCA 

GCT 

TCC 

GCT 





573 

(2) 

INFORMATION FOR SEQ ID NO: 

123 : 







(i) 


SEQUENCE CHARACTERISTICS: 








(A) 

LENGTH : 

573 base pairs 







(B) 

TYPE: 

nucleic 

acid 








(C) 

STRANDEDNESS : 

single 








(D) 

TOPOLOGY : 

linear 







(vi) 


ORIGINAL SOURCE: 











(A) 

ORGANISM: 

homosapiens 








(C) 

INDIVIDUAL 

ISOLATE: 

: P8 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 123: 


ATG 

AGC 

ACG 

ACT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

CGT 

39 

AAC 

ACC 

AGC 

CGC 

CGC 

CCA 

CAG 

GAC 

GTT 

AAG 

TTC 

CCG 

GGC 

78 

GGT 

GGT 

CAG 

ATC 

GTT 

GGT 

GGA 

GTT 

TAC 

CTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCC 

AGG 

TTG 

GGT 

GTG 

CGC 

GCG 

ACT 

AGG 

AAG 

ACT 

156 

TCC 

GAG 

CGA 

TCG 

CAA 

CCT 

CGT 

GGC 

AGG 

CGA 

CAA 

CCT 

ATC 

195 

CCC 

AAG 

GCT 

CGC 

CGG 

CCC 

GAG 

GGT 

AGG 

GCC 

TGG 

GCT 

CAG 

234 

CCC 

GGG 

CAC 

CCT 

TGG 

CCC 

CTC 

TAT 

GCC 

AAT 

GAG 

GGC 

TTG 

273 

GGG 

TGG 

GCG 

GGA 

TGG 

CTC 

CTG 

TCA 

CCC 

CGC 

GGC 

TCC 

CGG 

312 

CCT 

AGT 

TGG 

GGC 

CCC 

ACG 

GAC 

CCC 

CGG 

CGT 

AGG 

TCG 

CGC 

351 

AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTC 

ACA 

TGC 

GGC 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATT 

CCG 

CTC 

GTC 

GGC 

GGC 

CCC 

429 

CTA 

GGG 

GGC 

GTT 

GCC 

AGG 

GCC 

CTG 

GCG 

CAT 

GGC 

GTC 

CGG 

468 

GTT 

GTG 

GAG 

GAC 

GGC 

GTG 

AAC 

TAT 

GCA 

ACA 

GGG 

AAT 

CTG 

507 

CCT 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTC 

CTT 

TTG 

GCT 

TTG 

CTG 

546 

TCT 

TGT 

CTG 

ACC 

ATC 

CCA 

GCT 

TCC 

GCT 





573 


25 


30 
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(2) INFORMATION FOR SEQ ID NO: 124: 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T3 



(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 124: 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

CGT 

39 

AAC 

ACC 

AAC 

CGC 

CGC 

CCA 

CAG 

GAC 

GTT 

AAG 

TTC 

CCG 

GGC 

78 

GGT 

GGT 

CAG 

ATC 

GTT 

GGT 

GGA 

GTT 

TAC 

CTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCC 

AGG 

TTG 

GGT 

GTG 

CGC 

GCG 

ACT 

AGG 

AAG 

ACT 

156 

TCC 

GAG 

CGG 

TCG 

CAA 

CCT 

CGT 

GGA 

AGG 

CGA 

CAA 

CCT 

ATC 

195 

CCC 

AAG 

GCT 

CGC 

CGG 

CCC 

GAG 

GGT 

AGG 

GCC 

TGG 

GCT 

CAG 

234 

CCC 

GGG 

TAC 

CCT 

TGG 

CCC 

CTC 

TAT 

GGC 

GAC 

GAG 

GGC 

ATG 

273 

GGG 

TGG 

GCA 

GGA 

TGG 

CTC 

CTG 

TCA 

CCC 

CGC 

GGC 

TCC 

CGG 

312 

CCT 

AAT 

TGG 

GGC 

CCC 

ACA 

GAC 

CCC 

CGG 

CGT 

AGG 

TCG 

CGT 

351 

AAT 

CTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTC 

ACA 

TGC 

GGC 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATT 

CCG 

CTC 

GTC 

GGC 

GCT 

CCC 

429 

TTA 

GGG 

GGC 

GTT 

GCC 

AGG 

GCC 

CTG 

GCG 

CAT 

GGC 

GTC 

CGG 

468 

GTT 

CTG 

GAG 

GAC 

GGC 

GTG 

AAT 

TAC 

GCA 

ACA 

GGG 

AAT 

TTG 

507 

CCT 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTC 

CTC 

TTG 

GCT 

TTG 

CTG 

546 

TCC 

TGC 

TTG 

ACC 

ATC 

CCA 

GCT 

TCC 

GCT 





573 

(2) 

INFORMATION FOR SEQ ID NO: 

: 125 : 







(i) 


SEQUENCE CHARACTERISTICS: 








(A) 

LENGTH : 

: 573 base pairs 







(B) 

TYPE: 

nucleic 

acid 








(C) 

STRANDEDNESS : 

single 








(D) 

TOPOLOGY : 

linear 







(vi) 


ORIGINAL SOURCE 











(A) 

ORGANISM: 

homosapiens 








(C) 

INDIVIDUAL 

ISOLATE 

: T4 





(xi) 


SEQUENCE DESCRIPTION: SEQ ! 

ID NO: 125: 


ATG 

AGC 

ACA 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

AGA 

39 

AAC 

ACC 

AAC 

CGT 

CGC 

CCA 

CAG 

GAC 

GTT 

AAG 

TTC 

CCG 

GGC 

78 

GGC 

GGC 

CAG 

ATC 

GTT 

GGC 

GGA 

GTA 

TAC 

TTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCC 

AGG 

TTG 

GGT 

GTG 

CGC 

GCG 

ACA 

AGG 

AAG 

ACT 

156 

TCG 

GAG 

CGA 

TCC 

CAG 

CCA 

CGT 

GGG 

AGG 

CGC 

CAG 

CCC 

ATC 

195 

CCC 

AAA 

GAT 

CGG 

CGC 

TCC 

ACT 

GGC 

AAG 

TCC 

TGG 

GGA 

AAA 

234 

CCA 

GGA 

TAT 

CCC 

TGG 

CCC 

CTG 

TAT 

GGG 

AAT 

GAG 

GGA 

CTC 

273 
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5 


GGC TGG GCA GGA TGG CTC CTG TCC CCC CGA GGT TCC CGT 312 
CCC TCC TGG GGC CCC AAT GAC CCC CGG CAT AGG TCG CGC 351 
AAC GTG GGT AAG GTC ATC GAT ACC CTA ACG TGC AGC CTT 390 
GCC GAC CTC ATG GGG TAC GTC CCC GTC GTA GGC GGC CCG 429 
TTG GGT GGC GTC GCC AGA GCT CTC GCG CAT GGC GTG AGA 468 
GTC CTG GAG GAC GGG GTT AAT TAT GCA ACA GGG AAC TTA 507 
CCT GGT TGC TCC TTT TCT ATT TTC TTG CTG GCC CTA CTG 546 
TCC TGC ATC ACC ATT CCA GTC TCC GCT 573 


(2) 

INFORMATION FOR SEQ ID NO: 

126 : 






(i) 


SEQUENCE 

CHARACTERISTICS: 








(A) 

LENGTH : 

573 base pairs 







(B) 

TYPE: 

nucleic 

acid 








(C) 

STRANDEDNESS : 

single 








(D) 

TOPOLOGY : 

linear 







(vi) 


ORIGINAL 

, SOURCE: 











(A) 

ORGANISM: 

homosapiens 








(C) 

INDIVIDUAL 

ISOLATE: 

US10 





(xi) 


SEQUENCE 

: DESCRIPTION: SEQ ID NO: 126: 


ATG 

AGC 

ACA 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

AGA 

39 

AAC 

ACT 

AAC 

CGT 

CGC 

CCA 

CAA 

GAC 

GTT 

AAG 

TTT 

CCG 

GGC 

78 

GGC 

GGC 

CAG 

ATC 

GTT 

GGC 

GGA 

GTA 

TAC 

TTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCC 

AGG 

TTG 

GGT 

GTG 

CGC 

GCG 

ACA 

AGG 

AAG 

ACT 

156 

TCG 

GAG 

CGG 

TCC 

CAG 

CCA 

CGT 

GGG 

AGG 

CGC 

CAG 

CCC 

ATC 

195 

CCC 

AAA 

GAT 

CGG 

CGC 

CCC 

ACT 

GGC 

AAG 

TCC 

TGG 

GGA 

AAA 

234 

CCA 

GGA 

TAC 

CCT 

TGG 

CCC 

CTA 

TAT 

GGG 

AAT 

GAG 

GGA 

CTC 

273 

GGC 

TGG 

GCA 

GGA 

TGG 

CTC 

CTG 

TCC 

CCC 

CGA 

GGT 

TCC 

CGT 

312 

CCC 

TCT 

TGG 

GGC 

CCC 

ACT 

GAT 

CCC 

CGG 

CAT 

AGG 

TCG 

CGC 

351 

AAC 

GTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTA 

ACG 

TGC 

GGC 

TTT 

390 

GCC 

GAC 

CTC 

ATG 

GGA 

TAC 

ATC 

CCC 

GTC 

GTG 

GGC 

GCT 

CCG 

429 

CTT 

GGT 

GGC 

GTC 

GCC 

AGA 

GCT 

CTC 

GCG 

CAT 

GGC 

GTG 

AGG 

468 

GTC 

CTG 

GAG 

GAC 

GGG 

GTT 

AAT 

TAT 

GCA 

ACA 

GGG 

AAC 

TTA 

507 

CCC 

GGT 

TGC 

TCC 

TTT 

TCT 

ATC 

TTC 

TTG 

CTG 

GCC 

TTA 

CTG 

546 

TCC 

TGC 

ATC 

ACC 

ATT 

CCA 

GTC 

TCT 

GCT 





573 

(2) 

INFORMATION FOR SEQ ID NO 

: 127 : 






(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T9 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 127: 


ATG AGC ACA AAT CCA AAA CCC CAA AGA AAA ACC ATA AGA 39 
AAC ACC AAC CGT CGC CCA CAG GAC GTT AAG TTC CCG GGC 78 
GGC GGC CAG ATC GTT GGC GGA GTA TAC TTG TTG CCG CGC 117 
AGG GGC CCT AGG TTG GGT GTG CGC ACG ACA AGG AAG ACT 156 
TCG GAG CGG TCC CAG CCA CGT GGG AGG CGC CAG CCC ATC 195 
CCC AAA GAT CGG CGC TCC ACT GGC AAG TCC TGG GGA AAA 234 
CCA GGA TAC CCC TGG CCT CTA TAT GGG AAT GAG GGA CTC 273 
GGC TGG GCG GGA TGG CTC CTG TCC CCC CGA GGT TCC CGT 312 
CCC TCT TGG GGC CCC AGT GAC CCC CGG CAT AGG TCG CGC 351 
AAC GTG GGT AAG GTC ATC GAT ACC CTA ACG TGC GGC TTT 390 
GCC GAC CTC ATG GGG TAC ATC CCC GTC GTA GGC GCC CCG 429 
CTT GGT GGC GTT GCC AGA GCT CTC GCG CAC GGC GTG AGA 468 
GTC CTG GAG GAC GGG GTT AAT TAT GCA ACA GGG AAC CTA 507 
CCT GGT TGC TCT TTT TCT ATC TTC TTG CTG GCC CTA CTG 546 
TCC TGC ATC ACC ACT CCG GCC TCT GCT 573 


15 


(2) INFORMATION FOR SEQ ID NO: 128: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


20 


25 


30 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 128: 


ATG AGC ACA ATT CCT AAA CCT CAA AGA AAA ACC AAA AGA 39 
AAC ACT AAC CGT CGC CCA CAA GAC GTT AAG TTT CCG GGC 78 
GGC GGC CAG ATC GTT GGC GGA GTA TAC TTG CTG CCG CGC 117 
AGG GGC CCC AGG TTG GGT GTG CGC GCG ACA AGG AAG ACT 156 
TCG GAG CGG TCC CAG CCT CGT GGA AGG CGC CAG CCC ATC 195 
CCT AAA GAT CGG CGC TCC ACT GGC AAG TCC TGG GGA AAA 234 
CCA GGA TAC CCC TGG CCC CTG TAT GGG AAT GAG GGG CTC 273 
GGC TGG GCA GGA TGG CTC CTG TCC CCC CGA GGT TCT CGT 312 
CCC TCT TGG GGC CCC AAT GAC CCC CGG CAT AGG TCG CGC 351 
AAT GTG GGT AAA GTC ATC GAT ACC CTA ACG TGC GGC TTT 390 
GCC GAC CTC ATG GGG TAC ATC CCC GTC GTA GGC GCC CCG 429 
CTT GGT GGT GTC GCC AGA GCT CTT GCG CAT GGC GTG AGA 468 
GTC CTG GAG GAC GGA GTT AAT TAT GCA ACA GGT AAC TTA 507 
CCC GGT TGC TCC TTT TCT ATC TTC TTG CTA GCC CTG CTG 546 
TCC TGC ATC ACT ATT CCG GTT TCA GCT 573 


(2) INFORMATION FOR SEQ ID NO : 129: 


35 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T8 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 129: 


ATG AGC ACA AAT CCT AAA CCT CAA AGA AAA ACC AAA AGA 39 

AAC ACA AAC CGC CGC CCA CAG GAC GTC AAG TTC CCG GGT 78 

10 GGC GGC CAG ATC GTT GGC GGA GTT TAC TTG CTG CCG CGC 117 

AGG GGC CCT AGG TTG GGT GTG CGC GCG ACA AGG AAG ACT 156 

TCC GAG CGA TCC CAG CCG CGT GGG AGA CGC CAG CCC ATC 195 

CCG AAA GAT CGG CGC TCC ACC GGC AAG TCC TGG GGA AAA 234 

CCA GGA TAT CCT TGG CCT CTT TAC GGA AAC GAG GGC TGC 273 

GGT TGG GCA GGT TGG CTC CTG TCC CCC CGC GGG TCT CGT 312 

CCT ACT TGG GGC CCC ACT GAC CCC CGG CAT AGA TCA CGT 351 

AAT TTG GGC AGA GTC ATC GAT ACC ATT ACA TGT GGT TTT 390 

15 GCC GAC CTC ATG GGG TAC ATC CCT GTC GTT GGC GCC CCG 429 

GTC GGA GGC GTC GCC AGA GCT CTG GCA CAT GGT GTT AGG 468 

GTC CTG GAA GAC GGG ATA AAC TAT GCA ACA GGG AAT TTG 507 

CCT GGT TGC TCT TTT TCT ATC TTC TTG CTT GCT CTT CTG 546 

TCA TGC TTC ACA GTG CCA GTG TCT GCA 573 


2Q (2) INFORMATION FOR SEQ ID NO: 130: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

25 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: US1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 130: 


ATG AGC ACA AAT CCT AAA CCT CAA AGA AAA ACC AAA AGA 39 

_ n AAC ACA AAC CGC CGC CCA CAG GAC GTC AAG TTC CCG GGT 78 

GGC GGT CAG ATC GTT GGC GGA GTT TAC TTG CTG CCG CGC 117 

AGG GGC CCC AGG TTG GGT GTG CGC GCG ACA AGG AAG ACT 156 

TCC GAG CGA TCC CAG CCG CGT GGG AGA CGC CAG CCC ATC 195 

CCG AAA GAT CGG CGC TCC ACC GGC AAG TCC TGG GGA AAG 234 

CCA GGA TAT CCT TGG CCT CTG TAC GGA AAC GAG GGC TGC 273 

GGC TGG GCA GGT TGG CTC CTG TCC CCC CGC GGG TCT CGT 312 

CCT ACT TGG GGC CCC ACT GAC CCC CGG CAC AGA TCA CGT 351 

35 
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AAC TTG GGC AAG GTC ATC GAT ACC ATT ACG TGT GGT TTT 390 
GCC GAC CTC ATG GGG TAC ATC CCT GTC GTT GGC GCC CCG 429 
GTC GGA GGC GTC GCC AGA GCT CTG GCA CAC GGT GTT AGG 468 
GTC CTG GAA GAC GGG ATA AAT TAC GCA ACA GGG AAT CTG 507 
CCT GGT TGC TCC TTT TCT ATC TTC TTA CTT GCT CTT CTG 546 
TCG TGC GCC ACG GTG CCG GTG TCT GCA 573 


(2) INFORMATION FOR SEQ ID NO : 131: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK11 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 131: 


15 ATG AGC ACA AAT CCT AAA CCT CAA AGA AAA ACC AAA AGA 39 

AAT ACA AAC CGC CGC CCA CAG GAC GTT AAG TTC CCG GGT 78 

GGC GGC CAG ATC GTT GGC GGA GTT TAC TTG CTG CCG CGC 117 

AGG GGC CCC AGG TTG GGT GTG CGC ACG ACA AGG AAG ACT 156 

TCC GAG CGA TCC CAG CCG CGT GGG AGA CGC CAG CCC ATC 195 

CCG AAA GAT CGG CGC TCC ACC GGC AAG CCC TGG GGA AAG 234 

CCA GGA TAT CCT TGG CCC CTG TAT GGA AAC GAG GGC TGC 273 

90 GGC TGG GCA GGT TGG CTC CTG TCC CCC CGC GGG TCT CAT 312 

CCT AAT TGG GGC CCC ACT GAC CCC CGG CAT AAA TCA CGC 351 

AAT TTG GGT AAA GTC ATC GAC ACC ATT ACG TGT GGT TTT 390 

GCC GAC CTC ATG GGG TAC ATC CCT GTC GTC GGC GCC CCG 429 

GTC GGA GGC GTC GCC AGA GCT CTG GCA CAC GGT GTT AGA 468 

GTC CTG GAA GAC GGG ATA AAT TAC GCA ACA GGG AAT CTG 507 

CCT GGT TGC TCT TTT TCT ATC TTC TTA CTT GCT CTT CTG 546 

TCA TGC TGC ACA GTG CCA GTG TCT GCG 573 

25 


(2) INFORMATION FOR SEQ ID NO : 132: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SW3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 132: 
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5 


10 


15 


20 


25 


30 


35 


ATG 

AGC 

ACA 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

AGA 

39 

AAT 

ACA 

AAC 

CGC 

CGC 

CCA 

CAG 

GAC 

GTT 

AAG 

TTC 

CCG 

GGT 

78 

GGC 

GGC 

CAG 

ATC 

GTT 

GGC 

GGA 

GTT 

TAC 

TTG 

CTG 

CCG 

CGC 

117 

AGG 

GGC 

CCC 

AGG 

TTG 

GGT 

GTG 

CGC 

GCG 

ACA 

AGG 

AAG 

ACT 

156 

TCC 

GAG 

CGA 

TCC 

CAG 

CCG 

CGT 

GGG 

AGA 

CGC 

CAG 

CCC 

ATC 

195 

CCG 

AAA 

GAT 

CGG 

CGC 

TCC 

ACC 

GGC 

AAG 

TCC 

TGG 

GGA 

AAG 

234 

CCA 

GGA 

TAT 

CCT 

TGG 

CCC 

CTG 

TAT 

GGA 

AAC 

GAG 

GGC 

TGC 

273 

GGC 

TGG 

GCA 

GGT 

TGG 

CTC 

CTG 

TCC 

CCC 

CGC 

GGG 

TCT 

CAT 

312 

CCT 

AAT 

TGG 

GGC 

CCC 

ACT 

GAC 

CCC 

CGG 

CAT 

AGA 

TCA 

CGC 

351 

AAT 

TTG 

GGC 

AAA 

GTC 

ATC 

GAC 

ACC 

ATT 

ACG 

TGT 

GGT 

TTT 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATC 

CCT 

GTC 

GTT 

GGC 

GCC 

CCG 

429 

GTC 

GGA 

GGC 

GTC 

GCC 

AGA 

GCT 

CTG 

GCA 

CAC 

GGT 

GTT 

AGA 

468 

GTC 

CTG 

GAA 

GAC 

GGG 

ATA 

AAT 

TAC 

GCA 

ACA 

GGG 

AAT 

CTG 

507 

CCT 

GGT 

TGC 

TCT 

TTT 

TCT 

ATC 

TTC 

TTA 

CTT 

GCT 

CTT 

CTG 

546 

TCG 

TGC 

TTC 

ACA 

GTG 

CCA 

GTG 

TCT 

GCG 





573 

(2) 

INFORMATION FOR SEQ ID NO: 

: 133 : 







(i) 


SEQUENCE CHARACTERISTICS: 








(A) 

LENGTH : 

: 573 base pairs 







(B) 

TYPE: 

nucleic 

acid 








(C) 

STRANDEDNESS : 

single 








(D) 

TOPOLOGY : 

linear 







(vi) 


ORIGINAL SOURCE 











(A) 

ORGANISM: 

homo sapiens 








(C) 

INDIVIDUAL 

ISOLATE 

: DK8 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 133: 


ATG 

AGC 

ACA 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

AGA 

39 

AAC 

ACA 

AAC 

CGC 

CGC 

CCA 

CAG 

GAC 

GTT 

AAG 

TTC 

CCG 

GGT 

78 

GGC 

GGC 

CAG 

ATC 

GTT 

GGC 

GGA 

GTT 

TAC 

TTG 

CTG 

CCG 

CGC 

117 

AGG 

GGC 

CCC 

AGG 

TTG 

GGT 

GTG 

CGC 

GCG 

ACA 

AGG 

AAG 

TCT 

156 

TCC 

GAG 

CGA 

TCC 

CAG 

CCG 

CGT 

GGG 

AGG 

CGC 

CAG 

CCC 

ATC 

195 

CCG 

AAA 

GAT 

CGG 

CGC 

TCC 

ACC 

GGC 

AAG 

TCC 

TGG 

GGA 

AAA 

234 

CCG 

GGA 

TAT 

CCT 

TGG 

CCC 

CTG 

TAT 

GGA 

AAC 

GAG 

GGC 

TGC 

273 

GGC 

TGG 

GCA 

GGT 

TGG 

CTC 

CTG 

TCC 

CCC 

CGC 

GGG 

TCT 

CGT 

312 

CCT 

ACT 

TGG 

GGC 

CCC 

ACT 

GAC 

CCC 

CGG 

CAT 

AGA 

TCA 

CGC 

351 

AAT 

TTG 

GGC 

AAA 

GTC 

ATC 

GAC 

ACC 

ATT 

ACG 

TGT 

GGT 

TTT 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATC 

CCT 

GTC 

GTT 

GGC 

GCC 

CCG 

429 

GTT 

GGA 

GGC 

GTC 

GCC 

AGA 

GCT 

CTG 

GCA 

CAC 

GGT 

GTT 

AGG 

468 

GTC 

CTG 

GAA 

GAC 

GGG 

ATA 

AAT 

TAC 

GCA 

ACA 

GGG 

AAT 

TTG 

507 

CCT 

GGT 

TGC 

TCT 

TTT 

TCT 

ATC 

TTC 

TTG 

CTT 

GCT 

CTT 

CTG 

546 

TCG 

TGC 

TGC 

ACA 

GTG 

CCA 

GTG 

TCT 

GCG 





573 


(2) INFORMATION FOR SEQ ID NO: 134: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S83 



(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 134: 


ATG 

AGC 

ACA 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

AGA 

39 

AAC 

ACT 

AAC 

CGC 

CGC 

CCA 

CAG 

GAC 

GTC 

AAG 

TTC 

CCG 

GGC 

78 

GGT 

GGC 

CAG 

ATC 

GTT 

GGC 

GGA 

GTA 

TAC 

TTG 

CTG 

CCG 

CGC 

117 

AGG 

GGC 

CCG 

AGA 

TTG 

GGT 

GTG 

CGC 

GCG 

ACG 

AGG 

AAA 

ACT 

156 

TCC 

GAA 

CGG 

TCC 

CAG 

CCA 

CGT 

GGG 

AGG 

CGC 

CAG 

CCC 

ATC 

195 

CCT 

AAA 

GAT 

CGG 

CGC 

ACC 

ACT 

GGC 

AAG 

TCC 

TGG 

GGA 

AGG 

234 

CCA 

GGA 

TAC 

CCT 

TGG 

CCC 

CTG 

TAT 

GGG 

AAT 

GAG 

GGC 

CTC 

273 

GGC 

TGG 

GCA 

GGG 

TGG 

CTC 

CTG 

TCC 

CCC 

CGC 

GGT 

TCT 

CGC 

312 

CCT 

TCA 

TGG 

GGC 

CCC 

ACC 

GAC 

CCC 

CGG 

CAT 

AAA 

TCG 

CGC 

351 

AAC 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTA 

ACG 

TGC 

GGT 

TTT 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATA 

CCC 

GTC 

GTT 

GGC 

GCT 

CCC 

429 

GTT 

GGC 

GGC 

GTT 

GCC 

AGA 

GCC 

CTC 

GCC 

CAT 

GGG 

GTG 

AGG 

468 

GTT 

CTG 

GAG 

GAC 

GGG 

ATA 

AAT 

TAT 

GCA 

ACG 

GGG 

AAT 

TTG 

507 

CCC 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTT 

CTC 

TTG 

GCC 

CTC 

TTG 

546 

TCT 

TGC 

ATC 

TCT 

GTG 

CCA 

GTT 

TCC 

GCC 





573 


(2) 

INFORMATION FOR SEQ ID NO: 

135 : 







(i) 


SEQUENCE 

CHARACTERISTICS: 








(A) 

LENGTH : 

573 base pairs 







(B) 

TYPE: 

nucleic 

acid 








(C) 

STRANDEDNESS : 

single 








(D) 

TOPOLOGY : 

linear 







(vi) 


ORIGINAL 

, SOURCE: 











(A) 

ORGANISM: 

homosapiens 








(C) 

INDIVIDUAL 

ISOLATE: 

HK10 





(xi) 


SEQUENCE 

1 DESCRIPTION: SEQ ID NO: 135: 


ATG 

AGC 

ACA 

CTT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

AGA 

39 

AAC 

ACC 

ATC 

CGT 

CGC 

CCA 

CAG 

GAC 

GTT 

AAG 

TTC 

CCG 

GGT 

78 

GGC 

GGA 

CAG 

ATC 

GTT 

GGT 

GGA 

GTA 

TAC 

GTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCA 

CGA 

TTG 

GGT 

GTG 

CGC 

GCG 

ACG 

CGT 

AAA 

ACT 

156 

TCT 

GAA 

CGG 

TCG 

CAG 

CCT 

CGC 

GGA 

CGA 

CGA 

CAG 

CCT 

ATC 

195 

CCC 

AAG 

GCG 

CGT 

CGG 

AGC 

GAA 

GGC 

CGG 

TCC 

TGG 

GCT 

CAG 

234 

CCC 

GGG 

TAC 

CCT 

TGG 

CCC 

CTC 

TAT 

GGT 

AAC 

GAG 

GGC 

TGC 

273 

GGG 

TGG 

GCA 

GGA 

TGG 

CTC 

CTG 

TCC 

CCA 

CGC 

GGC 

TCC 

CGT 

312 

CCA 

TCT 

TGG 

GGC 

CCA 

AAC 

GAC 

CCC 

CGG 

CGA 

CGG 

TCC 

CGC 

351 

AAT 

TTG 

GGT 

AAA 

GTC 

ATC 

GAT 

ACC 

CTT 

ACG 

TGC 

GGA 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATC 

CCG 

CTC 

GTC 

GGC 

GCT 

CCC 

429 


372577_ 



172 


GTA GGA GGC 
GCC CTT GAA 
CCC GGT TGC 
TCT TGC TTA 


GTC GCA AGA 
GAC GGG ATA 
TCC TTT TCT 
ATT CAT CCA 


GCC CTC GCG 
AAT TTC GCA 
ATC TTC CTT 
GCA GCT AGT 


CAT GGC GTG 
ACA GGG AAC 
CTT GCT CTG 


AGG 468 

TTG 507 

TTC 546 

573 


5 


10 


15 


20 


25 


30 


35 


(2) INFORMATION FOR SEQ ID NO : 136: 

{ i ) SEQUENCE CHARACTER I ST I CS : 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S52 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 136: 


ATG AGC ACA CTT CCT AAA CCT CAA AGA AAA ACC AAA AGA 39 
AAC ACC ATC CGT CGC CCA CAG GAC GTT AAG TTC CCG GGT 78 
GGC GGA CAG ATC GTT GGT GGA GTA TAC GTG TTG CCG CGC 117 
AGG GGC CCA CGA TTG GGT GTG CGC GCG ACG CGT AAA ACT 156 
TCT GAA CGG TCA CAG CCT CGC GGA CGA CGA CAG CCT ATC 195 
CCC AAG GCG CGT CGG AGC GAA GGC CGG TCC TGG GCT CAG 234 
CCC GGG TAC CCT TGG CCC CTC TAT GGT AAT GAG GGC TGC 273 
GGG TGG GCA GGG TGG CTC CTG TCC CCA CGC GGC TCC CGT 312 
CCA TCT TGG GGC CCA AAC GAC CCC CGG CGG AGG TCC CGC 351 
AAT TTG GGT AAA GTC ATC GAT ACC CTT ACG TGC GGA TTC 390 
GCC GAC CTC ATG GGG TAC ATC CCG CTC GTC GGC GCT CCC 429 
GTA GGA GGC GTC GCA AGA GCC CTC GCG CAT GGC GTG AGG 468 
GCC CTT GAA GAC GGG ATA AAT TTT GCA ACA GGG AAC TTG 507 
CCC GGT TGC TCC TTT TCT ATC TTC CTT CTT GCT CTG TTC 546 
TCC TGC TTA GTT CAT CCT GCA GCT AGT 573 


(2) INFORMATION FOR SEQ ID NO: 137: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 137: 

ATG AGC ACA CTT CCT AAA CCT CAA AGA AAA ACC AAA AGA 39 


372577J 
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5 


10 


15 


20 


25 


30 


35 


AAC 

ACC 

ATC 

CGT 

CGC 

CCA 

CAG 

GAC 

ATC 

AAG 

TTC 

CCG 

GGT 

78 

GGC 

GGA 

CAG 

ATC 

GTT 

GGT 

GGA 

GTA 

TAC 

GTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCA 

CGA 

TTG 

GGT 

GTG 

CGC 

GCG 

ACG 

CGT 

AAA 

ACT 

156 

TCT 

GAA 

CGG 

TCA 

CAG 

CCT 

CGC 

GGA 

CGG 

CGA 

CAG 

CCT 

ATC 

195 

CCC 

AAG 

GCG 

CGT 

CGG 

AGC 

GAA 

GGC 

CGA 

TCC 

TGG 

GCT 

CAG 

234 

ccc 

GGG 

TAC 

CCT 

TGG 

CCC 

CTC 

TAT 

GGT 

AAC 

GAG 

GGC 

TGC 

273 

GGG 

TGG 

GCA 

GGG 

TGG 

CTC 

CTG 

TCC 

CCA 

CGC 

GGC 

TCC 

CGT 

312 

CCA 

TCT 

TGG 

GGC 

CCA 

AAT 

GAC 

CCC 

CGG 

CGG 

AGG 

TCC 

CGC 

351 

AAT 

TTG 

GGT 

AAA 

GTC 

ATC 

GAT 

ACC 

CTT 

ACG 

TGC 

GGC 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATC 

CCG 

CTC 

GTC 

GGC 

GCT 

CCC 

429 

GTA 

GGA 

GGC 

GTC 

GCA 

AGA 

GCC 

CTC 

GCG 

CAT 

GGC 

GTG 

AGG 

468 

GCC 

CTT 

GAA 

GAC 

GGG 

ATA 

AAT 

TTT 

GCA 

ACA 

GGG 

AAC 

TTG 

507 

CCC 

GGT 

TGC 

TCT 

TTT 

TCT 

ATC 

TTC 

CTT 

CTT 

GCC 

CTG 

TTC 

546 

TCT 

TGC 

TTA 

ATT 

CAT 

CCA 

GCA 

GCT 

AGT 





573 

(2) 

INFORMATION FOR SEQ ID NO: 

: 138 : 







(i) 


SEQUENCE CHARACTERISTICS: 








(A) 

LENGTH : 

573 base pairs 







(B) 

TYPE: 

nucleic 

acid 








(C) 

STRANDEDNESS : 

single 








<D) 

TOPOLOGY : 

linear 







(vi) 


ORIGINAL SOURCE: 











(A) 

ORGANISM: 

homosapiens 








(C) 

INDIVIDUAL 

ISOLATE 

: DK12 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 138: 


ATG 

AGC 

ACA 

CTT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

AGA 

39 

AAC 

ACC 

ATC 

CGT 

CGC 

CCA 

CAG 

GAC 

GTC 

AAG 

TTC 

CCG 

GGT 

78 

GGC 

GGA 

CAG 

ATC 

GTT 

GGT 

GGA 

GTA 

TAC 

GTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCA 

CGA 

TTG 

GGT 

GTG 

CGC 

GCG 

ACG 

CGT 

AAA 

ACT 

156 

TCT 

GAA 

CGG 

TCA 

CAG 

CCT 

CGC 

GGA 

CGG 

CGA 

CAG 

CCT 

ATC 

195 

CCC 

AAG 

GCG 

CGT 

CGG 

AGC 

GAA 

GGC 

CGG 

TCC 

TGG 

GCT 

CAG 

234 

CCT 

GGG 

TAC 

CCT 

TGG 

CCC 

CTC 

TAT 

GGT 

AAC 

GAG 

GGC 

TGC 

273 

GGG 

TGG 

GCA 

GGG 

TGG 

CTC 

CTG 

TCC 

CCA 

CGC 

GGC 

TCC 

CGT 

312 

CCA 

TCT 

TGG 

GGC 

CCA 

AAC 

GAC 

CCC 

CGG 

CGG 

AGG 

TCC 

CGC 

351 

AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTC 

ACG 

TGC 

GGA 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATC 

CCG 

CTC 

GTC 

GGC 

GCT 

CCT 

429 

GTA 

GGG 

GGC 

GTC 

GCA 

AGA 

GCC 

CTC 

GCG 

CAT 

GGC 

GTG 

AGG 

468 

GCC 

CTT 

GAA 

GAC 

GGG 

ATA 

AAT 

TTC 

GCA 

ACA 

GGG 

AAC 

TTG 

507 

CCC 

GGT 

TGC 

TCC 

TTT 

TCT 

ATC 

TTC 

CTT 

CTT 

GCT 

CTG 

TTC 

546 

TCT 

TGC 

CTA 

ATT 

CAT 

CCA 

GCA 

GCT 

AGT 





573 

(2) 

INFORMATION FOR SEQ ID NO 

: 139 : 






(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 


372S77_1 
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(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: Z4 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 139: 


ATG AGC ACG AAT CCT AAA CCT CAA AGA AAA ACC AAA CGT 39 

AAC ACC AAC CGC CGC CCC ATG GAC GTA AAG TTC CCG GGT 78 

GGT GGC CAG ATC GTT GGC GGA GTT TAC TTG TTG CCG CGC 117 

AGG GGC CCC AGG TTG GGT GTG CGC GCG ACT CGA AAG ACT 156 

TCG GAG CGG TCG CAA CCT CGT GGC AGG CGT CAA CCT ATC 195 

CCC AAG GCG CGC CAG CCA GAG GGC AGA TCC TGG GCG CAG 234 

10 CCC GGG TAC CCT TGG CCC CTC TAT GGC AAT GAG GGC TGC 273 

GGG TGG GCA GGG TGG CTC CTG TCT CCT CGC GGC TCT CGG 312 

CCA TCT TGG GGC CCA AAT GAT CCC CGG CGG AGA TCG CGC 351 

AAT CTG GGT AAG GTC ATC GAT ACC CTG ACG TGC GGC TTC 390 

GCC GAC CTC ATG GGA TAC ATC CCG ATC GTG GGC GCC CCC 429 

GTG GGG GGC GTC GCC AGG GCT CTG GCG CAT GGC GTC AGG 468 

GCT GTG GAG GAC GGG ATT AAC TAT GCA ACA GGG AAT CTT 507 

CCC GGT TGC TCT TTC TCT ATC TTC CTT TTG GCA CTT CTT 546 

15 TCG TGC CTC ACT GTT CCA GCG TCG GCT 573 


(2) INFORMATION FOR SEQ ID NO: 140: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

2Q (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: Z8 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 140: 


ATG AGC ACG AAT CCT AAA CCT CAA AGA AAA ACC AAA CGT 39 
AAC ACC AAC CGC CGC CCT ATG GAT GTA AAA TTC CCA GGC 78 
GGC GGC CAG ATC GTT GGC GGA GTT TAC TTG TTG CCG CGC 117 
AGG GGC CCC AGG TTG GGT GTG CGC GCG ACT CGG AAG ACT 156 
TCG GAG CGG TCG CAA CCT CGT GGC AGG CGT CAG CCT ATC 195 
CCC AAG GCA CGT CGG TCC GAG GGT AGG TCC TGG GCT CAG 234 
CCC GGG TAC CCA TGG CCT CTT TAC GGT AAT GAA GGC TGT 273 
GGG TGG GCA GGT TGG CTC CTG TCC CCC CGC GGC TCT CGA 312 
CCG TCT TGG GGC CCA AAT GAT CCC CGG CGG AGG TCG CGC 351 
AAT TTG GGT AAG GTC ATC GAT ACC CTC ACG TGC GGC TTC 390 
GCC GAC CTC ATG GGA TAC ATC CCG CTC GTG GGC GCC CCA 429 
GTA GGA GGC GTC GCC AGA GCC CTG GCG CAT GGC GTC AGG 468 
GCT GTG GAG GAC GGG ATC AAC TAT GCA ACA GGG AAC CTT 507 

35 


372577_ 
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CCT GGT TGC TCT TTC TCT ATC TTC CTC TTG GCA CTT CTC 546 

TCG TGC CTA ACC GTC CCA GCG TCT GCT 573 


(2) INFORMATION FOR SEQ ID NO : 141: 


5 


10 


15 


20 


25 


30 


35 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: Z1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 141: 


ATG 

AGC 

ACA 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

CGT 

39 

AAC 

ACC 

AAC 

CGT 

CGC 

CCC 

ATG 

GAT 

GTG 

AAA 

TTC 

CCG 

GGC 

78 

GGC 

GGC 

CAG 

ATC 

GTT 

GGC 

GGA 

GTT 

TAC 

TTG 

CTG 

CCG 

CGC 

117 

AGG 

GGC 

CCC 

CGG 

TTG 

GGT 

GTG 

CGC 

GCA 

GCT 

CGG 

AAG 

ACT 

156 

TCG 

GAG 

CGG 

TCA 

CAA 

CCT 

CGT 

GGC 

AGG 

CGT 

CAG 

CCT 

ATC 

195 

CCC 

AAG 

GCG 

CGC 

CGG 

TCC 

GAG 

GGC 

AGG 

TCC 

TGG 

GCT 

CAG 

234 

CCC 

GGG 

TAC 

CCT 

TGG 

CCC 

CTT 

TAC 

GGC 

AAT 

GAG 

GGC 

TGT 

273 

GGG 

TGG 

GCA 

GGG 

TGG 

CTC 

CTG 

TCC 

CCC 

CGC 

GGT 

TCC 

AGG 

312 

CCG 

TCT 

TGG 

GGC 

CCC 

AAT 

GAT 

CCC 

CGG 

CGT 

AGG 

TCC 

CGT 

351 

AAT 

CTG 

GGT 

AAA 

GTC 

ATC 

GAT 

ACC 

CTG 

ACG 

TGT 

GGC 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGA 

TAC 

ATT 

CCG 

CTC 

GTA 

GGC 

GCC 

CCT 

429 

GTG 

GGT 

GGC 

GTC 

GCC 

AGG 

GCC 

CTG 

GCG 

CAT 

GGC 

GTC 

AGG 

468 

GCC 

GTG 

GAG 

GAC 

GGA 

ATT 

AAC 

TAC 

GCA 

ACA 

GGG 

AAC 

CTT 

507 

CCT 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTT 

CTT 

CTT 

GCA 

CTT 

CTC 

546 

TCG 

TGC 

CTG 

ACA 

ACA 

CCA 

GCA 

TCT 

GCC 





573 


(2) INFORMATION FOR SEQ ID NO : 142: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: Z5 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 142: 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

CGT 

39 

AAC 

ACC 

AAC 

CGC 

CGC 

CCC 

ATG 

GAT 

GTA 

AAA 

TTC 

CCG 

GGT 

78 

GGT 

GGT 

CAG 

ATC 

GTT 

GGC 

GGA 

GTT 

TAC 

TTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCC 

AGG 

TTG 

GGT 

GTG 

CGC 

GCG 

ACT 

CGG 

AAG 

ACT 

156 


372577_1 
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5 


10 


15 


20 


25 


30 


35 


TCG 

GAG 

CGG 

TCG 

CAA 

CCT 

CGC 

GGC 

AGG 

CGT 

CAG 

CCT 

ATC 

195 

CCC 

CAG 

GCA 

CGT 

CGG 

TCC 

GAG 

GGC 

AGG 

TCC 

TGG 

GCT 

CAG 

234 

CCC 

GGG 

TAC 

CCT 

TGG 

CCT 

CTT 

TAT 

GGC 

AAT 

GAG 

GGC 

TGT 

273 

GGG 

TGG 

GCA 

GGG 

TGG 

CTC 

CTG 

TCC 

CCC 

CGC 

GGA 

TCT 

CGG 

312 

CCA 

TCT 

TGG 

GGC 

CAA 

AAT 

GAT 

CCC 

CGG 

CGT 

AGG 

TCC 

CGC 

351 

AAT 

CTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTG 

ACG 

TGT 

GGC 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGA 

TAC 

ATT 

CCG 

CTC 

GTC 

GGC 

GCC 

CCA 

429 

GTA 

GGT 

GGC 

GTC 

GCC 

AGG 

GCC 

TTG 

GCG 

CAT 

GGC 

GTC 

AGG 

468 

GCC 

CTG 

GAG 

GAC 

GGA 

ATC 

AAC 

TAT 

GCA 

ACA 

GGG 

AAT 

CTT 

507 

CCT 

GGT 

TGC 

TCC 

TTT 

TCT 

ATC 

TTC 

CTA 

CTT 

GCA 

CTT 

TTC 

546 

TCG 

TGC 

TTG 

ACA 

ACA 

CCG 

GCA 

TCC 

GCT 





573 

(2) 

INFORMATION FOR SEQ ID NO: 

: 143 : 







(i) 


SEQUENCE CHARACTERISTICS: 








(A) 

LENGTH : 

: 573 base pairs 







(B) 

TYPE: 

nucleic 

acid 








(C) 

STRANDEDNESS : 

single 








(D) 

TOPOLOGY : 

linear 







(vi) 


ORIGINAL SOURCE: 











(A) 

ORGANISM: 

homosapiens 








(C) 

INDIVIDUAL 

ISOLATE: 

: Z6 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 143: 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

CGT 

39 

AAC 

ACC 

AAC 

CGC 

CGC 

CCC 

ATG 

GAC 

GTT 

AAG 

TTC 

CCG 

GGT 

78 

GGT 

GGC 

CAG 

ATC 

GTT 

GGC 

GGA 

GTT 

TAC 

TTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCC 

AGG 

TTG 

GGT 

GTG 

CGC 

GCG 

ACT 

AGG 

AAG 

ACT 

156 

TCG 

GAG 

CGG 

TCG 

CAA 

CCT 

CGT 

GGG 

AGA 

CGC 

CAG 

CCT 

ATC 

195 

CCC 

AAG 

GCA 

CGT 

CGA 

TCT 

GAG 

GGA 

AGG 

TCC 

TGG 

GCT 

CAG 

234 

CCC 

GGG 

TAT 

CCA 

TGG 

CCT 

CTT 

TAC 

GGT 

AAT 

GAG 

GGT 

TGC 

273 

GGG 

TGG 

GCG 

GGA 

TGG 

CTC 

CTG 

TCA 

CCC 

CGT 

GGC 

TCT 

CGA 

312 

CCG 

TCT 

TGG 

GGT 

CCA 

AAT 

GAT 

CCC 

CGG 

CGA 

AGG 

TCC 

CGC 

351 

AAC 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACT 

CTA 

ACT 

TGC 

GGT 

TTC 

390 

GCC 

GAT 

CTC 

ATG 

GGA 

TAC 

ATC 

CCG 

CTC 

GTA 

GGC 

GCC 

CCC 

429 

GTG 

GGC 

GGC 

GTC 

GCC 

AGG 

GCC 

CTG 

GCA 

CAT 

GGT 

GTT 

AGG 

468 

GCT 

GTG 

GAG 

GAC 

GGG 

ATC 

AAT 

TAT 

GCA 

ACA 

GGG 

AAT 

CTT 

507 

CCC 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTC 

CTC 

TTG 

GCA 

CTT 

CTT 

546 

TCG 

TGC 

CTA 

ACT 

GTT 

CCC 

ACC 

TCG 

GCC 





573 


(2) 

INFORMATION FOR SEQ ID NO: 144: 


(i) 

SEQUENCE CHARACTERISTICS: 



(A) 

LENGTH: 573 base pairs 



(B) 

TYPE: nucleic acid 



(C) 

STRANDEDNESS: single 



(D) 

TOPOLOGY: linear 


(vi) ORIGINAL SOURCE: 


372577_1 
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(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: Z7 




(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 144: 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

CGT 

39 

AAC 

ACC 

AAC 

CGC 

CGC 

CCC 

ATG 

GAC 

GTT 

AAG 

TTC 

CCG 

GGC 

78 

GGT 

GGC 

CAG 

ATC 

GTT 

GGC 

GGA 

GTT 

TAC 

TTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCC 

AGA 

TTG 

GGT 

GTG 

CGC 

ACA 

ACT 

AGG 

AAG 

ACT 

156 

TCG 

GAG 

CGG 

TCG 

CAA 

CCT 

CGT 

GGG 

AGA 

CGT 

CAG 

CCT 

ATC 

195 

CCC 

AAG 

GCA 

CGT 

CGA 

TCT 

GAG 

GGA 

AGG 

TCC 

TGG 

GCT 

CAA 

234 

CCC 

GGG 

TAC 

CCA 

TGG 

CCT 

CTT 

TAC 

GGT 

AAC 

GAG 

GGT 

TGC 

273 

GGG 

TGG 

GCA 

GGA 

TGG 

CTC 

TTG 

TCA 

CCC 

CGT 

GGC 

TCT 

CGA 

312 

CCG 

TCT 

TGG 

GGC 

CCA 

AAT 

GAT 

CCC 

CGG 

CGA 

AGG 

TCC 

CGC 

351 

AAC 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTA 

ACC 

TGC 

GGC 

TTT 

390 

GCC 

GAC 

CTC 

ATG 

GGA 

TAC 

ATC 

CCG 

CTC 

GTA 

GGC 

GCC 

CCC 

429 

GTG 

GGC 

GGC 

GTC 

GCC 

AGG 

GCC 

CTA 

GCG 

CAT 

GGC 

GTT 

AGG 

468 

GCT 

CTG 

GAG 

GAC 

GGG 

ATT 

AAT 

TAT 

GCA 

ACA 

GGG 

AAC 

CTT 

507 

CCC 

GGT 

TGC 

TCT 

TTT 

TCT 

ATC 

TTC 

CTC 

TTG 

GCA 

CTT 

CTT 

546 

TCG 

TGC 

CTG 

ACT 

GTT 

CCC 

GCC 

TCG 

GCC 





573 


(2) 

INFORMATION FOR SEQ ID NO: 

: 145 : 







(i) 


SEQUENCE CHARACTERISTICS: 








(A) 

LENGTH : 

573 base pairs 







(B) 

TYPE : 

nucleic 

acid 








(C) 

STRANDEDNESS : 

single 








(D) 

TOPOLOGY : 

linear 







(vi) 


ORIGINAL SOURCE: 











(A) 

ORGANISM: 

homosapiens 








(O 

INDIVIDUAL 

ISOLATE : 

DK13 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 145: 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

CGT 

39 

AAC 

ACC 

AAC 

CGC 

CGC 

CCA 

ATG 

GAC 

GTT 

AAG 

TTC 

CCG 

GGT 

78 

GGC 

GGC 

CAG 

ATC 

GTT 

GGC 

GGA 

GTT 

TAC 

TTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCT 

AGA 

TTG 

GGT 

GTG 

CGC 

GCG 

ACT 

AGG 

AAG 

ACT 

156 

TCG 

GAG 

CGG 

TCG 

CAA 

CCT 

CGT 

GGG 

AGG 

CGC 

CAG 

CCT 

ATC 

195 

CCC 

AAG 

GCG 

CGC 

CAA 

CTC 

GAG 

GGT 

AGG 

TCC 

TGG 

GCT 

CAG 

234 

CCT 

GGG 

TAT 

CCT 

TGG 

CCC 

CTT 

TAC 

GGC 

AAT 

GAG 

GGC 

TGC 

273 

GGG 

TGG 

GCG 

GGA 

TGG 

CTC 

CTG 

TCA 

CCC 

CGT 

GGC 

TCT 

CGG 

312 

CCG 

TCT 

TGG 

GGC 

CCG 

AAT 

GAT 

CCC 

CGG 

CGG 

AGG 

TCC 

CGC 

351 

AAC 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTA 

ACT 

TGC 

GGC 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGA 

TAC 

ATC 

CCG 

GTC 

GTA 

GGC 

GCC 

CCC 

429 

GTG 

GGT 

GGC 

GTC 

GCC 

AGA 

GCC 

CTG 

GCG 

CAT 

GGC 

GTC 

AGG 

468 

CTT 

CTG 

GAG 

GAC 

GGG 

GTC 

AAT 

TAT 

GCA 

ACA 

GGG 

AAT 

CTT 

507 

CCC 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTC 

CTC 

TTG 

GCA 

CTG 

CTC 

546 

TCG 

TGC 

CTG 

ACT 

GTT 

CCC 

GCT 

TCG 

GCC 





573 

372577_ 
















178 


(2) INFORMATION FOR SEQ ID NO: 146: 


5 


10 


15 


20 


25 


30 


35 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA4 




(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 146: 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

AGA 

39 

AAC 

ACC 

AAC 

CGC 

CGC 

CCA 

CAG 

GAC 

GTT 

AAG 

TTC 

CCG 

GGC 

78 

GGT 

GGT 

CAG 

ATC 

GTT 

GGT 

GGA 

GTC 

TAC 

TTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCT 

AGG 

TTG 

GGT 

GTG 

CGC 

GCG 

ACT 

CGG 

AAG 

ACT 

156 

TCA 

GAA 

CGG 

TCG 

CAA 

CCC 

CGT 

GGG 

CGG 

CGC 

CAG 

CCT 

ATT 

195 

ccc 

AAG 

GCG 

CGC 

CAA 

CCC 

ACG 

GGC 

CGG 

TCC 

TGG 

GGT 

CAA 

234 

ccc 

GGG 

TAC 

CCT 

TGG 

CCC 

CTT 

TAC 

GCC 

AAT 

GAG 

GGC 

CTC 

273 

GGG 

TGG 

GCA 

GGG 

TGG 

TTG 

CTC 

TCC 

CCC 

CGA 

GGC 

TCT 

CGG 

312 

CCT 

AAT 

TGG 

GGC 

CCC 

AAT 

GAC 

CCC 

CGG 

CGA 

AAG 

TCG 

CGC 

351 

AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTA 

ACG 

TGC 

GGA 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATC 

CCG 

CTC 

GTA 

GGC 

GGC 

CCC 

429 

GTT 

GGG 

GGC 

GTC 

GCA 

AGG 

GCC 

CTT 

GCA 

CAT 

GGT 

GTG 

AGG 

468 

GTT 

CTT 

GAG 

GAC 

GGG 

GTA 

AAC 

TAT 

GCA 

ACG 

GGG 

AAT 

TTG 

507 

CCC 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTT 

ATC 

CTT 

GCA 

CTT 

CTC 

546 

TCG 

TGC 

CTG 

ACC 

GTC 

CCG 

GCC 

TCT 

GCA 





573 


(2) 

INFORMATION FOR SEQ ID NO: 147: 






(i) 


SEQUENCE CHARACTERISTICS: 








(A) 

LENGTH: 573 base pairs 







(B) 

TYPE: nucleic acid 








(C) 

STRANDEDNESS: single 








(D) 

TOPOLOGY: linear 






(vi) 


ORIGINAL SOURCE: 








(A) 

ORGANISM: homosapiens 








(C) 

INDIVIDUAL ISOLATE: SA5 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 147: 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA CCT CAA AGA AAA ACC 

AAA 

AGA 

39 

AAC 

ACC 

AAC 

CGC 

CGC 

CCA CAG GAC GTC AAG TTC 

CCG 

GGC 

78 

GGT 

GGT 

CAG 

ATC 

GTT 

GGT GGA GTT TAC TTG TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCT 

AGA 

TTG 

GGT GTG CGC GCG ACT CGG 

AAG 

ACT 

156 

TCA 

GAA 

CGG 

TCG 

CAA 

CCC CGT GGG CGG CGC CAG 

CCT 

ATT 

195 

CCC 

AAG 

GCG 

CGC 

CAA 

CCC ACG GGC CGG TCC TGG 

GGT 

CAA 

234 


372577_ 
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CCC 

GGG 

TAC 

CCT 

TGG 

CCC 

CTT 

TAC 

GCC 

AAT 

GAG 

GGC 

CTC 

GGG 

TGG 

GCA 

GGG 

TGG 

TTG 

CTC 

TCC 

CCC 

CGA 

GGC 

TCT 

CGG 

CCT 

AAT 

TGG 

GGC 

CCC 

AAT 

GAC 

CCC 

CGG 

CGA 

AAA 

TCG 

CGC 

AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTA 

ACG 

TGC 

GGA 

TTC 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATC 

CCG 

CTC 

GTA 

GGC 

GGC 

CCC 

GTT 

GGG 

GGC 

GTC 

GCA 

AGG 

GCC 

CTC 

GCA 

CAT 

GGT 

GTG 

AGG 

GTT 

CTT 

GAG 

GAC 

GGG 

GTA 

AAC 

TAT 

GCA 

ACA 

GGG 

AAT 

TTG 

CCC 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTT 

ATC 

CTT 

GCA 

CTT 

CTC 

TCG 

TGC 

TTG 

ACC 

GTC 

CCA 

GCC 

TCT 

GCA 






INFORMATION FOR SEQ ID NO : 148: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 148: 


20 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

AGA 

AAC 

ACC 

AAC 

CGC 

CGC 

CCA 

CAG 

GAC 

GTC 

AAG 

TTC 

CCG 

GGC 

GGT 

GGT 

CAG 

ATC 

GTT 

GGT 

GGA 

GTT 

TAC 

TTG 

TTG 

CCG 

CGC 

AGG 

GGC 

CCT 

AGG 

TTG 

GGT 

GTG 

CGC 

GCG 

ACT 

CGG 

AAG 

ACT 

TCA 

GAA 

CGG 

TCG 

CAA 

CCC 

CGT 

GGG 

CGG 

CGC 

CAG 

CCT 

ATT 

CCC 

AAG 

GCG 

CGC 

CAA 

CCC 

ACG 

GGC 

CGG 

TCC 

TGG 

GGT 

CAA 

CCC 

GGG 

TAC 

CCT 

TGG 

CCC 

CTT 

TAC 

GCC 

AAT 

GAG 

GGC 

CTC 

GGG 

TGG 

GCA 

GGG 

TGG 

TTG 

CTC 

TCC 

CCC 

CGA 

GGC 

TCT 

CGG 

CCT 

AAT 

TGG 

GGC 

CCC 

AAT 

GAC 

CCC 

CGG 

CGA 

AAG 

TCG 

CGC 

AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAC 

ACC 

CTA 

ACA 

TGC 

GGA 

TTC 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATC 

CCG 

CTC 

GTA 

GGC 

GGC 

CCC 

GTT 

GGG 

GGC 

GTC 

GCA 

AGG 

GCT 

CTC 

GCA 

CAC 

GGT 

GTG 

AGG 

GTT 

CTT 

GAG 

GAC 

GGG 

GTA 

AAT 

TAC 

GCA 

ACA 

GGG 

AAT 

CTG 

CCC 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTT 

ATC 

CTT 

GCA 

CTT 

CTC 

TCG 

TGC 

CTG 

ACC 

GTC 

CCA 

GCC 

TCC 

GCA 





(2) 

INFORMATION FOR SEQ ID NO: 

: 149 : 





30 


(vi) 


SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA1 


273 

312 

351 

390 

429 

468 

507 

546 

573 


39 

78 

117 

156 

195 

234 

273 

312 

351 

390 

429 

468 

507 

546 

573 


372577_ 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 149: 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

AGA 

39 

AAC 

ACC 

AAC 

CTC 

CGC 

CCA 

CAG 

GAC 

GTC 

AAG 

TTC 

CCG 

GGC 

78 

GGT 

GGT 

CAG 

ATC 

GTT 

GGT 

GGA 

GTT 

TAC 

TTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCC 

AGG 

TTG 

GGT 

GTG 

CGC 

GCG 

ACT 

CGG 

AAG 

ACT 

156 

TCG 

GAA 

CGG 

TCG 

CAA 

CCC 

CGT 

GGG 

CGG 

CGC 

CAG 

CCT 

ATT 

195 

CCC 

AAG 

GCG 

CGC 

CAA 

CCC 

ACG 

GGC 

CGG 

TCC 

TGG 

GGT 

CAA 

234 

CCC 

GGG 

TAC 

CCT 

TGG 

CCC 

CTT 

TAC 

GCC 

AAT 

GAG 

GGC 

CTC 

273 

GGG 

TGG 

GCA 

GGG 

TGG 

TTG 

CTC 

TCC 

CCC 

CGA 

GGC 

TCT 

CGG 

312 

CCT 

AAT 

TGG 

GGC 

CCC 

AAT 

GAC 

CCC 

CGG 

CGG 

AAG 

TCG 

CGC 

351 

AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTA 

ACG 

TGC 

GGA 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATC 

CCG 

CTC 

GTA 

GGC 

GGC 

CCC 

429 

GTT 

GGG 

GGC 

GTC 

GCA 

AGG 

GCT 

CTC 

GCA 

CAC 

GGT 

GTG 

AGG 

468 

GTT 

CTT 

GAG 

GAC 

GGG 

GTA 

AAC 

TAC 

GCA 

ACA 

GGG 

AAT 

TTG 

507 

CCC 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTT 

ATC 

CTT 

GCA 

CTT 

CTT 

546 

TCC 

TGT 

CTG 

ATC 

ATC 

CCG 

GCC 

TCT 

GCA 





573 


(2) INFORMATION FOR SEQ ID NO: 150: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 150: 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

AGA 

39 

AAC 

ACC 

AAC 

CGC 

CGC 

CCA 

CAG 

GAC 

GTC 

AAG 

TTC 

CCG 

GGC 

78 

GGT 

GGT 

CAG 

ATC 

GTT 

GGT 

GGA 

GTT 

TAC 

TTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCC 

AGG 

TTG 

GGT 

GTG 

CGC 

GCG 

ACT 

CGG 

AAG 

ACT 

156 

TCA 

GAA 

CGG 

TCG 

CAA 

CCC 

CGT 

GGA 

CGG 

CGC 

CAG 

CCT 

ATT 

195 

CCC 

AAG 

GCT 

CGC 

CAG 

CCC 

ACG 

GGC 

CGG 

TCC 

TGG 

GGT 

CAA 

234 

CCC 

GGG 

TAC 

CCT 

TGG 

CCC 

CTT 

TAC 

GCC 

AAT 

GAG 

GGC 

CTC 

273 

GAG 

TGG 

GCA 

GGG 

TGG 

TTG 

CTC 

TCC 

CCC 

CGA 

GGC 

TCT 

CGG 

312 

CCT 

AGT 

TGG 

GGC 

CCC 

AAC 

GAC 

CCC 

CGG 

CGG 

AAA 

TCG 

CGC 

351 

AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTA 

ACG 

TGC 

GGA 

TTC 

390 

GCC 

GAT 

CTC 

ATG 

GGG 

TAC 

ATC 

CCG 

CTC 

GTA 

GGC 

GGC 

CCC 

429 

GTT 

GGG 

GGC 

GTC 

GCA 

AGG 

GCT 

CTC 

GCA 

CAT 

GGT 

GTG 

AGG 

468 

GTT 

CTT 

GAG 

GAC 

GGG 

GTA 

AAC 

TAC 

GCA 

ACA 

GGG 

AAT 

TTA 

507 

CCC 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTT 

ATC 

CTT 

GCA 

CTT 

CTT 

546 

TCA 

TGC 

CTG 

ACC 

GTC 

CCG 

GCC 

TCT 

GCA 





573 


(2) INFORMATION FOR SEQ ID NO: 151: 


35 


372577J 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA13 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 151: 


ATG AGC ACG AAT CCT AAA CCT CAA AGA AAA ACC AAA AGA 39 

AAC ACC AAC CGC CGC CCA CAG GAC GTC AAG TTC CCG GGC 78 

10 GGT GGT CAG ATC GTT GGT GGA GTT TAC TTG TTG CCG CGC 117 

AGG GGC CCT AGG TTG GGT GTG CGC GCA ACT CGG AAG ACT 156 

TCA GAA CGG TCG CAA CCC CGT GGA CGG CGT CAG CCT ATC 195 

CCC AAG GCG CGC CAG CCC ACG GGC CGG TCC TGG GGT CAA 234 

CCC GGG TAC CCT TGG CCC CTT TAT GCC AAT GAG GGC CTC 273 

GGG TGG GCA GGG TGG TTG CTC TCC CCC CGA GGC TCT CGG 312 

CCT AAT TGG GGC CCC AAT GAC CCC CGG CGG AAA TCG CGC 351 

AAC TTG GGT AAG GTC ATC GAT ACC CTG ACG TGC GGA TTC 390 

15 GCC GAC CTC ATG GGG TAC ATC CCG CTC GTA GGC GGC CCC 429 

GTT GGG GGC GTC GCA AGG GCT CTC GCA CAC GGT GTG AGG 468 

GTC CTT GAG GAC GGG GTA AAC TAT GCA ACA GGG AAT TTA 507 

CCC GGT TGC TCT TTC TCT ATC TTT ATC CTT GCA CTT CTT 546 

TCA TGC CTG ACT GTC CCG ACC TCT GCC 573 


20 (2) INFORMATION FOR SEQ ID NO: 152: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

25 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA6 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 152: 


ATG AGC ACG AAT CCT AAA CCT CAA AGA AAA ACC CAA AGA 39 

AAC ACC AAC CGC CGC CCA CAG GAC GTC AAG TTC CCG GGC 78 

■* U GGT GGT CAG ATC GTT GGT GGA GTT TAC TTG TTG CCG CGC 117 

AGG GGC CCT CGT ATG GGT GTG CGC GCG ACT CGG AAG ACT 156 

TCG GAA CGG TCG CAA CCC CGT GGA CGG CGT CAG CCT ATT 195 

CCC AAG GCG CGC CAA TCC GCG GGT CGG TCC TGG GGT CAA 234 

CCC GGG TAC CCT TGG CCC CTT TAC GCC AAT GAG GGC CTC 273 

GGG TGG GCA GGG TGG TTG CTC TCC CCC CGA GGC TCT CGG 312 

CCT AAT TGG GGC CCC AAT GAC CCC CGG CGA AAA TCG CGC 351 

35 
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AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTA 

ACG 

TGC 

GGA 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATC 

CCG 

CTC 

GTA 

GGC 

GGC 

CCC 

429 

GTT 

GGG 

GGC 

GTC 

GCA 

AGG 

GCT 

CTC 

GCA 

CAC 

GGT 

GTG 

AGG 

468 

GTT 

CTT 

GAG 

GAC 

GGG 

GTA 

AAC 

TAT 

GCA 

ACA 

GGG 

AAT 

TTG 

507 

CCC 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTT 

GTC 

CTT 

GCA 

CTT 

CTC 

546 

TCG 

TGC 

CTA 

ACC 

GTC 

CCT 

GCC 

TCT 

GCA 





573 


( 2 ) 


INFORMATION FOR SEQ ID NO: 153: 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA11 


(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 153: 


25 


ATG 

AGC 

ACG 

AAT 

CCT 

AAA 

CCT 

CAA 

AGA 

AAA 

ACC 

AAA 

AGA 

39 

AAC 

ACC 

AAC 

CGC 

CGC 

CCA 

CAG 

GAC 

GTC 

AAG 

TTC 

CCG 

GGC 

78 

GGT 

GGT 

CAG 

ATC 

GTT 

GGT 

GGA 

GTT 

TAC 

TTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCT 

AGG 

TTG 

GGT 

GTG 

CGC 

GCG 

ACT 

CGG 

AAG 

ACT 

156 

TCA 

GAA 

CGG 

TCG 

CAA 

CCC 

CGT 

GGG 

CGG 

CGT 

CAG 

CCT 

ATT 

195 

CCC 

AAG 

GCG 

CGC 

CAA 

CCC 

ACG 

GGC 

CGG 

TCC 

TGG 

GGT 

CAA 

234 

CCC 

GGG 

TAC 

CCT 

TGG 

CCC 

TTT 

TAC 

GCC 

AAT 

GAG 

GGC 

CTC 

273 

GGG 

TGG 

GCA 

GGG 

TGG 

CTG 

CTC 

TCC 

CCT 

CGA 

GGC 

TCT 

CGG 

312 

CCT 

AAC 

TGG 

GGC 

CCC 

AAT 

GAC 

CCC 

CGG 

CGA 

AGA 

TCG 

CGC 

351 

AAT 

TTG 

GGC 

AAG 

GTC 

ATC 

GAT 

ACC 

CTA 

ACG 

TGC 

GGA 

TTC 

390 

GCC 

GAC 

CTC 

ATG 

GGG 

TAC 

ATC 

CCG 

CTC 

GTA 

GGC 

GGC 

CCC 

429 

GTT 

GGG 

GGC 

GTC 

GCA 

AGG 

GCC 

CTC 

GCA 

CAC 

GGT 

GTG 

AGA 

468 

GCT 

CTT 

GAG 

GAC 

GGG 

GTA 

AAT 

TAT 

GCA 

ACA 

GGG 

AAT 

CTT 

507 

CCC 

GGT 

TGC 

TCT 

TTC 

TCC 

ATC 

TTT 

ATC 

CTT 

GCA 

CTT 

CTC 

546 

TCG 

TGC 

TTG 

ACC 

GTC 

CCG 

GCC 

ACT 

GCA 





573 

(2) 

INFORMATION FOR SEQ ID NO: 

: 154 : 






30 


SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 573 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: HK2 


SEQUENCE DESCRIPTION: SEQ ID NO: 154: 


35 


ATG AGC ACA CTT CCA AAA CCC CAA AGA AAA ACC AAA AGA 
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183 


AAC 

ACC 

AAC 

CGT 

CGC 

CCA 

ACG 

GAC 

GTC 

AAG 

TTC 

CCG 

GGT 

78 

GGC 

GGT 

CAG 

ATC 

GTT 

GGC 

GGA 

GTT 

TAC 

TTG 

TTG 

CCG 

CGC 

117 

AGG 

GGC 

CCC 

CGG 

TTG 

GGT 

GTG 

CGC 

GCG 

ACG 

AGA 

AAG 

ACT 

156 

TCC 

GAG 

CGA 

TCC 

CAG 

CCC 

AGA 

GGC 

AGG 

CGC 

CAA 

CCT 

ATA 

195 

CCA 

AAG 

GCG 

CGC 

CAG 

CCC 

CAG 

GGC 

AGG 

CAC 

TGG 

GCT 

CAG 

234 

CCC 

GGA 

TAC 

CCT 

TGG 

CCT 

CTT 

TAT 

GGA 

AAC 

GAG 

GGC 

TGT 

273 

GGG 

TGG 

GCA 

GGT 

TGG 

CTC 

CTG 

TCC 

CCC 

CGC 

GGC 

TCC 

CGG 

312 

CCA 

CAT 

TGG 

GGC 

CCC 

AAT 

GAC 

CCC 

CGG 

CGT 

CGA 

TCC 

CGG 

351 

AAT 

TTG 

GGT 

AAG 

GTC 

ATC 

GAT 

ACC 

CTA 

ACG 

TGT 

GGG 

TTC 

390 

GCC 

GAT 

CTC 

ATG 

GGG 

TAC 

ATT 

CCC 

GTC 

GTG 

GGC 

GCG 

CCT 

429 

TTG 

GGC 

GGC 

GTC 

GCG 

GCT 

GCG 

CTC 

GCA 

CAT 

GGC 

GTG 

AGG 

468 

GCA 

ATC 

GAG 

GAC 

GGG 

ATC 

AAT 

TAT 

GCA 

ACA 

GGG 

AAT 

CTC 

507 

CCC 

GGT 

TGC 

TCT 

TTC 

TCT 

ATC 

TTC 

CTT 

TTG 

GCA 

CTA 

CTC 

546 

TCG 

TGC 

CTC 

ACA 

ACG 

CCA 

GCT 

TCG 

GCT 





573 


(2) INFORMATION FOR SEQ ID NO: 155: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK7 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 155: 


Met 

1 

Ser 

Thr 

Asn 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

Asn 

Arg 

Arg 

Pro 

Gin 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

He 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Pro 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Arg 

70 

Pro 

Glu 

Gly 

Arg 

Thr 

75 

Trp 

Ala 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Leu 

85 

Tyr 

Gly 

Asn 

Glu 

Gly 

90 

Cys 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Ser 

Trp 

Gly 

Pro 

Thr 

110 

Asp 

Pro 

Arg 

Arg 

Arg 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Leu 

Val 

140 

Gly 

Ala 

Pro 

Leu 

Gly 

145 

Gly 

Ala 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 


3725771 
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Val Arg Val Leu Glu Asp Gly Val Asn Tyr Ala Thr Gly Asn 
155 160 165 

Leu Pro Gly Cys Ser Phe Ser lie Phe Leu Leu Ala Leu Leu 
170 175 180 

Ser Cys Leu Thr Val Pro Ala Ser Ala 
185 190 

5 

(2) INFORMATION FOR SEQ ID NO: 156: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

10 (D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: US11 


15 


20 


25 


30 


(Xl) SEQUENCE DESCRIPTION: SEQ ID NO: 156: 


Met Ser Thr Asn Pro Lys Pro 
1 5 

Thr Asn Arg Arg Pro Gin Asp 
15 20 

Gin He Val Gly Gly Val Tyr 
30 35 

Arg Leu Gly Val Arg Ala Thr 
45 

Gin Pro Arg Gly Arg Arg Gin 
60 

Pro Glu Gly Arg Thr Trp Ala 
75 

Leu Tyr Gly Asn Glu Gly Cys 
85 90 

Ser Pro Arg Gly Ser Arg Pro 
100 105 

Arg Arg Arg Ser Arg Asn Leu 
115 

Thr Cys Gly Phe Ala Asp Leu 
130 

Gly Ala Pro Leu Gly Gly Ala 
145 

Val Arg Val Leu Glu Asp Gly 
155 160 

Leu Pro Gly Cys Ser Phe Ser 
170 175 

Ser Cys Leu Thr Val Pro Ala 
185 


Gin Arg Lys Thr Lys Arg Asn 
10 

Val Lys Phe Pro Gly Gly Gly 
25 

Leu Leu Pro Arg Arg Gly Pro 
40 

Arg Lys Thr Ser Glu Arg Ser 
50 55 

Pro lie Pro Lys Ala Arg Arg 
65 70 

Gin Pro Gly Tyr Pro Trp Pro 
80 

Gly Trp Ala Gly Trp Leu Leu 
95 

Ser Trp Gly Pro Thr Asp Pro 
110 

Gly Lys Val lie Asp Thr Leu 
120 125 

Met Gly Tyr lie Pro Leu Val 
135 140 

Ala Arg Ala Leu Ala His Gly 
150 

Val Asn Tyr Ala Thr Gly Asn 
165 

lie Phe Leu Leu Ala Leu Leu 
180 

Ser Ala 
190 


35 
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(2) INFORMATION FOR SEQ ID NO: 157: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S14 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 157: 


Met 

1 

Ser 

Thr 

Asn 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

Asn 

Arg 

Arg 

Pro 

Gin 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

He 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Arg 

70 

Pro 

Glu 

Gly 

Arg 

Thr 

75 

Trp 

Ala 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Leu 

85 

Tyr 

Gly 

Asn 

Glu 

Gly 

90 

Cys 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Ser 

Trp 

Gly 

Pro 

Thr 

110 

Asp 

Pro 

Arg 

Arg 

Arg 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Leu 

Val 

140 

Gly 

Ala 

Pro 

Leu 

Gly 

145 

Gly 

Ala 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 

Val 

155 

Arg 

Val 

Leu 

Glu 

Asp 

160 

Gly 

Val 

Asn 

Tyr 

Ala 

165 

Thr 

Gly Asn 

Leu 

Ser 

Pro 

170 

Cys 

Gly 

Leu 

185 

Cys 

Thr 

Ser 

Val 

Phe 

Pro 

Ser 

175 

Ala 

lie 

Ser 

190 

Phe 

Ala 

Leu 

Leu 

Ala 

180 

Leu 

Leu 


(2) INFORMATION FOR SEQ ID NO : 158: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY : unknown 
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10 


15 


20 
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30 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SW1 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 158: 


Met 

Ser 

Thr 

Asn 

Pro 

Lys 

Pro 

Gin 

Arg 

Lys 

Thr 

Lys 

Arg 

Asn 

1 




5 





10 





Thr 

Asn 

Arg 

Arg 

Pro 

Gin 

Asp 

Val 

Lys 

Phe 

Pro 

Gly 

Gly 

Gly 

15 





20 





25 




Gin 

He 

Val 

Gly 

Gly 

Val 

Tyr 

Leu 

Leu 

Pro 

Arg 

Arg 

Gly 

Pro 


30 





35 





40 



Arg 

Leu 

Gly 

Val 

Arg 

Ala 

Thr 

Arg 

Lys 

Thr 

Ser 

Glu 

Arg 

Ser 



45 





50 





55 


Gin 

Pro 

Arg 

Gly 

Arg 

Arg 

Gin 

Pro 

lie 

Pro 

Lys 

Ala 

Arg 

Arg 




60 





65 





70 

Pro 

Glu 

Gly 

Arg 

Thr 

Trp 

Ala 

Gin 

Pro 

Gly 

Tyr 

Pro 

Trp 

Pro 





75 





80 





Leu 

Tyr 

Gly 

Asn 

Glu 

Gly 

Cys 

Gly 

Trp 

Ala 

Gly 

Trp 

Leu 

Leu 

85 





90 





95 




Ser 

Pro 

Arg 

Gly 

Ser 

Arg 

Pro 

Ser 

Trp 

Gly 

Pro 

Thr 

Asp 

Pro 


100 





105 





110 



Arg 

Arg 

Arg 

Ser 

Arg 

Asn 

Leu 

Gly 

Lys 

Val 

lie 

Asp 

Thr 

Leu 



115 





120 





125 


Thr 

Cys 

Gly 

Phe 

Ala 

Asp 

Leu 

Met 

Gly 

Tyr 

lie 

Pro 

Leu 

Val 




130 





135 





140 

Gly 

Ala 

Pro 

Leu 

Gly 

Gly 

Ala 

Ala 

Arg 

Ala 

Leu 

Ala 

His 

Gly 





145 





150 





Val 

Arg 

Val 

Leu 

Glu 

Asp 

Gly 

Val 

Asn 

Tyr 

Ala 

Thr 

Gly 

Asn 

155 





160 





165 



Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

lie 

Phe 

Leu 

Leu 

Ala 

Leu 

Leu 


170 





175 





180 



Ser 

Cys 

Leu 

Thr 

Val 

Pro 

Ala 

Ser 

Ala 







185 190 


(2) INFORMATION FOR SEQ ID NO: 159: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S18 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 159: 

Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn 
15 10 
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187 


Thr 

15 

Asn 

Arg 

Arg 

Pro 

Gin 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

He 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Arg 

70 

Pro 

Glu 

Gly 

Arg 

Thr 

75 

Trp 

Ala 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Leu 

85 

Tyr 

Gly 

Asn 

Glu 

Gly 

90 

Cys 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Ser 

Trp 

Gly 

Pro 

Thr 

110 

Asp 

Pro 

Arg 

Arg 

Arg 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Leu 

Val 

140 

Gly Ala 

Pro 

Leu 

Gly 

145 

Gly 

Ala 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 

Val 

155 

Arg 

Val 

Leu 

Glu 

Asp 

160 

Gly 

Val 

Asn 

Tyr 

Ala 

165 

Thr 

Gly 

Asn 

Leu 

Ser 

Pro 

170 

Cys 

Gly 

Leu 

185 

Cys 

Thr 

Ser 

Val 

Phe 

Pro 

Ser 

175 

Ala 

lie 

Ser 

190 

Phe 

Ala 

Leu 

Leu 

Ala 

180 

Leu 

Leu 


20 


25 


30 


(2) INFORMATION FOR SEQ ID NO : 160: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DR4 


<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 160: 


Met 

1 

Ser 

Thr 

Asn 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

Asn 

Arg 

Arg 

Pro 

Gin 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

lie 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

Arg 

Arg 

Gin 

Pro 

lie 

Pro 

Lys 

Ala 

Arg 

Arg 


60 65 70 


35 
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Pro 

Glu 

Gly 

Arg 

Thr 

Trp 

Ala 

Gin 

Pro 

Gly 

Tyr 

Pro 

Trp 

Pro 





75 





80 




Leu 

Tyr 

Gly 

Asn 

Glu 

Gly 

Cys 

Gly Trp 

Ala 

Gly 

Trp 

Leu 

Leu 

85 





90 





95 



Ser 

Pro 

Arg 

Gly 

Ser 

Arg 

Pro 

Ser 

Trp 

Gly 

Pro 

Thr 

Asp 

Pro 


100 





105 





110 


Arg 

Arg 

Arg 

Ser 

Arg 

Asn 

Leu 

Gly 

Lys 

Val 

lie 

Asp 

Thr 

Leu 



115 





120 




125 


Thr 

Cys 

Gly 

Phe 

Ala 

Asp 

Leu 

Met 

Gly 

Tyr 

lie 

Pro 

Leu 

Val 




130 





135 





140 

Gly Ala 

Pro 

Leu 

Gly 

Gly 

Ala 

Ala 

Arg 

Ala 

Leu 

Ala 

His 

Gly 





145 





150 




Val 

Arg 

Val 

Leu 

Glu 

Asp 

Gly 

Val 

Asn 

Tyr 

Ala 

Thr 

Gly 

Asn 

155 





160 




165 



Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

He 

Phe 

Leu 

Leu 

Ala 

Leu 

Leu 


170 





175 





180 



Ser 

Cys 

Leu 

Thr 

Val 

Pro 

Ala 

Ser 

Ala 








185 





190 
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25 


30 
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(2) INFORMATION FOR SEQ ID NO: 161: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

{ C ) S TRANDEDNE S S : unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA10 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 161: 


Met 

1 

Ser 

Thr 

Asn 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

Asn 

Arg 

Arg 

Pro 

Gin 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

lie 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Gin 

70 

Pro 

Glu 

Gly 

Arg 

Thr 

75 

Trp 

Ala 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Leu 

85 

Tyr 

Gly 

Asn 

Glu 

Gly 

90 

Leu 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Ser 

Trp 

Gly 

Pro 

Thr 

110 

Asp 

Pro 

Arg 

Arg 

Arg 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 
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Thr Cys Gly Phe 
130 

Gly Ala Pro Leu 

Val Arg Val Leu 
155 

Leu Pro Gly Cys 
170 

Ser Cys Leu Thr 
185 


Ala Asp Leu Met Gly 
135 

Gly Gly Ala Ala Arg 
145 

Glu Asp Gly Val Asn 
160 

Pro Phe Ser lie Phe 
175 

lie Pro Ala Ser Ala 
190 


Tyr lie Pro Leu Val 
14 0 

Ala Leu Ala His Gly 
150 

Tyr Ala Thr Gly Asn 
165 

Leu Leu Ala Leu Leu 
180 


10 


15 


20 


25 


30 


35 


(2) INFORMATION FOR SEQ ID NO: 162: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 162: 

Met Ser Thr Asn Pro Lys Pro Gin Arg Ala Thr Lys Arg Asn 
1 5 10 

Thr Asn Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly 
15 20 25 

Gin He Val Gly Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro 
30 35 40 

Arq Leu Gly Val Arg Ala Thr Arg Lys Thr Ser Glu Arg Ser 
45 50 55 

Gin Pro Arg Gly Arg Arg Gin Pro lie Pro Lys Ala Arg Arg 
60 65 70 

Pro Glu Gly Arg Ala Trp Ala Gin Pro Gly His Pro Trp Pro 
75 80 

Leu Tyr Gly Asn Glu Gly Leu Gly Trp Ala Gly Trp Leu Leu 
85 90 95 

Ser Pro Arg Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro 
100 105 110 

Arg Arg Arg Ser Arg Asn Leu Gly Lys Val lie Asp Thr Leu 
115 120 125 

Thr Cys Gly Phe Ala Asp Leu Met Gly Tyr lie Pro Leu Val 
130 135 140 

Gly Ala Pro Leu Gly Gly Ala Ala Arg Ala Leu Ala His Gly 
145 150 

Val Arg Val Leu Glu Asp Gly Val Asn Tyr Ala Thr Gly Asn 
155 160 165 

Leu Pro Gly Cys Ser Phe Ser lie Phe Leu Leu Ala Leu Leu 
170 175 180 
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Ser Cys Leu Thr He Pro Ala Ser Ala 
185 190 
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(2) INFORMATION FOR SEQ ID NO: 163: 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: D1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 163: 


Met Ser Thr Asn Pro Lys Pro 
1 5 

Thr Asn Arg Arg Pro Gin Asp 
15 20 
Gin lie Val Gly Gly Val Tyr 
30 35 

Arg Leu Gly Val Arg Ala Thr 
45 

Gin Pro Arg Gly Arg Arg Gin 
60 

Pro Glu Gly Arg Ala Trp Ala 
75 

Leu Tyr Gly Asn Glu Gly Leu 
85 90 

Ser Pro Arg Gly Ser Arg Pro 
100 105 

Arg Arg Arg Ser Arg Asn Leu 
115 

Thr Cys Gly Phe Ala Asp Leu 
130 

Gly Ala Pro Leu Gly Gly Ala 
145 

Val Arg Val Leu Glu Asp Gly 
155 160 

Leu Pro Gly Cys Ser Phe Ser 
170 175 

Ser Cys Leu Thr lie Pro Ala 
185 


Gin Arg Lys Thr Lys Arg Asn 
10 

Val Lys Phe Pro Gly Gly Gly 
25 

Leu Leu Pro Arg Arg Gly Pro 
40 

Arg Lys Thr Ser Glu Arg Ser 
50 55 

Pro lie Pro Lys Ala Arg Arg 
65 70 

Gin Pro Gly Tyr Pro Trp Pro 
80 

Gly Trp Ala Gly Trp Leu Leu 
95 

Ser Trp Gly Pro Thr Asp Pro 
110 

Gly Lys Val lie Asp Thr Leu 
120 125 

Met Gly Tyr lie Pro Leu Val 
135 140 

Ala Arg Ala Leu Ala His Gly 
150 

Val Asn Tyr Ala Thr Gly Asn 
165 

lie Phe Leu Leu Ala Leu Leu 
180 

Ser Ala 
190 


(2) INFORMATION FOR SEQ ID NO : 164: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 


35 
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(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 


5 


10 


15 


20 


25 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: US6 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 164: 


Met Ser Thr Asn Pro Lys Pro 
1 5 

Thr Asn Arg Arg Pro Gin Asp 
15 20 
Gin He Val Gly Gly Val Tyr 
30 35 

Arg Leu Gly Val Arg Ala Thr 
45 

Gin Pro Arg Gly Arg Arg Gin 
60 

Pro Glu Gly Arg Ala Trp Ala 
75 

Leu Tyr Gly Asn Glu Gly Met 
85 90 

Ser Pro Arg Gly Ser Arg Pro 
100 105 

Arg Arg Arg Ser Arg Asn Leu 
115 

Thr Cys Gly Phe Ala Asp Leu 
130 

Gly Ala Pro Leu Gly Gly Ala 
145 

Val Arg Val Leu Glu Asp Gly 
155 160 

Leu Pro Gly Cys Ser Phe Ser 
170 175 

Ser Cys Leu Thr lie Pro Ala 
185 


Gin Arg Lys Thr Lys Arg Asn 
10 

Val Lys Phe Pro Gly Gly Gly 
25 

Leu Leu Pro Arg Arg Gly Pro 
40 

Arg Lys Thr Ser Glu Arg Ser 
50 55 

Pro lie Pro Lys Ala Arg Arg 
65 70 

Gin Pro Gly Tyr Pro Trp Pro 
80 

Gly Trp Ala Gly Trp Leu Leu 
95 

Ser Trp Gly Pro Thr Asp Pro 
110 

Gly Lys Val lie Asp Thr Leu 
120 125 

Met Gly Tyr lie Pro Leu Val 
135 140 

Ala Arg Ala Leu Ala His Gly 
150 

Val Asn Tyr Ala Thr Gly Asn 
165 

lie Phe Leu Leu Ala Leu Leu 
180 

Ser Ala 
190 


30 


(2) INFORMATION FOR SEQ ID NO: 165: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: P10 


35 
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10 


15 


20 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 165: 


Met Ser Thr Asn Pro Lys Pro 
1 5 

Thr Asn Arg Arg Pro Gin Asp 
15 20 
Gin He Val Gly Gly Val Tyr 
30 35 

Arg Leu Gly Val Arg Ala Thr 
45 

Gin Pro Arg Gly Arg Arg Gin 
60 

Pro Glu Gly Arg Ala Trp Ala 
75 

Leu Tyr Gly Asn Glu Gly Leu 
85 90 

Ser Pro Arg Gly Ser Arg Pro 
100 105 

Arg Arg Arg Ser Arg Asn Leu 
115 

Thr Cys Gly Phe Ala Asp Leu 
130 

Gly Ala Pro Leu Gly Gly Ala 
145 

Val Arg Val Leu Glu Asp Gly 
155 160 

Leu Pro Gly Cys Ser Phe Ser 
170 175 

Ser Cys Leu Thr lie Pro Ala 
185 


Gin Arg Lys Thr Lys Arg Asn 
10 

Val Lys Phe Pro Gly Gly Gly 
25 

Leu Leu Pro Arg Arg Gly Pro 
40 

Arg Lys Thr Ser Glu Arg Ser 
50 55 

Pro lie Pro Lys Ala Arg Arg 
65 70 

Gin Pro Gly Tyr Pro Trp Pro 
80 

Gly Trp Ala Gly Trp Leu Leu 
95 

Ser Trp Gly Pro Thr Asp Pro 
110 

Gly Lys Val lie Asp Thr Leu 
120 125 

Met Gly Tyr lie Pro Leu Val 
135 140 

Ala Arg Ala Leu Ala His Gly 
150 

Val Asn Tyr Ala Thr Gly Asn 
165 

lie Phe Leu Leu Ala Leu Leu 
180 

Ser Ala 
190 


25 


30 


35 


(2) INFORMATION FOR SEQ ID NO : 166: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNE S S : unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 166: 

Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn 
15 10 

Thr Asn Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly 
15 20 25 

Gin lie Val Gly Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro 
30 35 40 
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Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

Glu 

Gly 

Arg 

Ala 

75 

Trp 

Ala 

Leu 

85 

Tyr 

Gly 

Asn 

Glu 

Gly 

90 

Met 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Arg 

Arg 

Arg 

115 

Ser 

Arg 

Asn 

Leu 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Gly 

Ala 

Pro 

Leu 

Gly 

145 

Gly 

Ala 

Val 

155 

Arg 

Val 

Leu 

Glu 

Asp 

160 

Gly 

Leu 

Pro 

170 

Gly 

Cys 

Ser 

Phe 

Ser 

175 

Ser 

Cys 

Leu 

Thr 

lie 

Pro 

Ala 


185 


Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Pro 

He 

65 

Pro 

Lys 

Ala 

Arg 

Arg 

70 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Trp 

Gly 

Pro 

Asn 

110 

Asp 

Pro 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Leu 

Val 

140 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 

Val 

Asn 

Tyr 

Ala 

165 

Thr 

Gly 

Asn 

lie 

Phe 

Leu 

Leu 

Ala 

Leu 

Leu 


180 


Ser Ala 
190 


15 


(2) INFORMATION FOR SEQ ID NO: 167: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T10 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 167: 


Met 

1 

Ser 

Thr Asn 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

Asn 

Arg Arg 

Pro 

Gin 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

lie 

30 

Val Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly Val 
45 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg Gly 
60 

Arg 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Gin 

70 

Pro 

Glu 

Gly Arg 

Ala 

75 

Trp 

Ala 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Leu 

85 

Tyr 

Gly Asn 

Glu 

Gly 

90 

Met 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 


35 


3725771 



194 


5 


10 


15 


20 


25 


30 


35 


Ser Pro Arg Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro 
100 105 110 

Arg Arq Arg Ser Arg Asn Leu Gly Lys Val lie Asp Thr Leu 
111 120 125 

Thr Cys Gly Phe Ala Asp Leu Met Gly Tyr lie Pro Leu Val 
130 135 _ 140 

Gly Ala Pro Leu Gly Gly Ala Ala Arg Ala Leu Ala His Gly 
145 150 

Val Arg Val Leu Glu Asp Gly Val Asn Tyr Ala Thr Gly Asn 
155 160 165 

Leu Pro Gly Cys Ser Phe Ser lie Phe Leu Leu Ala Leu Leu 
170 175 180 

Ser Cys Leu Thr lie Pro Ala Ser Ala 
185 190 


(2) INFORMATION FOR SEQ ID NO: 168: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SW2 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 168: 


Met 

1 

Ser 

Thr 

Asn 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

Asn 

Arg 

Arg 

Pro 

Gin 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

lie 

30 

Val 

Gly Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

He 

65 

Pro 

Lys 

Ala 

Arg 

Gin 

70 

Pro 

Glu 

Gly 

Arg 

Ala 

75 

Trp 

Ala 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Leu 

85 

Tyr 

Gly 

Asn 

Glu 

Gly 

90 

Met 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Ser 

Trp 

Gly 

Pro 

Thr 

110 

Asp 

Pro 

Arg 

Arg 

Arg 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Leu 

Val 

140 

Gly 

Ala 

Pro 

Leu 

Gly 

145 

Gly 

Ala 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 


372577_ 
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Val Arg Val Leu Glu Asp Gly Val Asn Tyr Ala Thr Gly Asn 
155 160 165 

Leu Pro Gly Cys Ser Phe Ser lie Phe Leu Leu Ala Leu Leu 
170 175 180 

Ser Cys Leu Thr lie Pro Ala Ser Ala 
185 190 
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(2) INFORMATION FOR SEQ ID NO : 169: 


10 


15 


20 


25 


30 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: IND3 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 169: 


Met 

Ser 

Thr 

Asn 

Pro 

Lys 

Pro 

Gin 

Arg 

Lys 

Thr 

Lys 

Arg 

Asn 

1 




5 





10 





Thr 

Asn 

Arg 

Arg 

Pro 

Gin 

Asp 

Val 

Lys 

Phe 

Pro 

Gly 

Gly 

Gly 

15 




20 





25 




Gin 

He 

Val 

Gly 

Gly 

Val 

Tyr 

Leu 

Leu 

Pro 

Arg 

Arg 

Gly 

Pro 


30 




35 





40 



Arg 

Leu 

Gly 

Val 

Arg 

Ala 

Thr 

Arg 

Lys 

Thr 

Ser 

Glu 

Arg 

Ser 


45 





50 





55 


Gin 

Pro 

Arg 

Gly 

Arg 

Arg 

Gin 

Pro 

lie 

Pro 

Lys 

Ala 

Arg 

Arg 



60 




65 





70 

Pro 

Glu 

Gly 

Arg 

Ala 

Trp 

Ala 

Gin 

Pro 

Gly 

Tyr 

Pro 

Trp 

Pro 




75 





80 





Leu 

Tyr 

Gly 

Asn 

Glu 

Gly 

Leu 

Gly 

Trp 

Ala 

Gly 

Trp 

Leu 

Leu 

85 



90 





95 




Ser 

Pro 

Arg 

Gly 

Ser 

Arg 

Pro 

Ser 

Trp 

Gly 

Pro 

Thr 

Asp 

Pro 


100 




105 





110 



Arg 

Arg 

Arg 

Ser 

Arg 

Asn 

Leu 

Gly 

Lys 

Val 

lie 

Asp 

Thr 

Leu 

115 





120 





125 

Val 

Thr 

Cys 

Gly 

Phe 

Ala 

Asp 

Leu 

Met 

Gly 

Tyr 

lie 

Pro 

Leu 


130 





135 





140 

Gly Ala 

Pro 

Leu 

Gly 

Gly 

Ala 

Ala 

Arg 

Ala 

Leu 

Ala 

His 

Gly 





145 





150 





Val 

Arg 

Val 

Leu 

Glu 

Asp 

Gly 

Val 

Asn 

Tyr 

Ala 

Thr 

Gly 

Asn 

155 




160 





165 




Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

lie 

Phe 

Leu 

Leu 

Ala 

Leu 

Leu 


170 



175 





180 



Ser 

Cys 

Leu 

Thr 

lie 

Pro 

Ala 

Ser 

Ala 







185 190 


35 
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5 


10 


15 


20 


25 


(2) INFORMATION FOR SEQ ID NO: 170: 


(i) SEQUENCE CHARACTERISTICS: 






(A) 

LENGTH : 

191 amino 

acids 







(B) 

TYPE: 

amino acid 









(C) 

STRANDEDNESS : 

unknown 








(D) 

TOPOLOGY : 

unknown 






(vi) 


ORIGINAL SOURCE: 











(A) 

ORGANISM: 

homosapiens 








(C) 

INDIVIDUAL 

ISOLATE: 

IND8 




(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 170: 

Met 

Ser 

Thr 

Asn 

Pro 

Lys Pro 

Gin 

Arg 

Lys 

Thr 

Lys 

Arg 

Asn 

1 




5 




10 





Thr 

Asn 

Arg 

Arg 

Pro 

Gin Asp 

Val 

Lys 

Phe 

Pro 

Gly 

Gly Gly 

15 




20 




25 




Gin 

He 

Val 

Gly 

Gly 

Val Tyr 

Leu 

Leu 

Pro 

Arg 

Arg 

Gly 

Pro 


30 



35 





40 



Arg 

Leu 

Gly 

Val 

Arg 

Ala Thr 

Arg 

Lys 

Thr 

Ser 

Glu 

Arg 

Ser 


45 




50 





55 


Gin 

Pro 

Arg 

Gly Arg 

Arg Gin 

Pro 

lie 

Pro 

Lys 

Ala 

Arg 

Arg 



60 




65 





70 

Pro 

Glu 

Gly 

Arg 

Ala 

Trp Ala 

Gin 

Pro 

Gly 

His 

Pro 

Trp 

Pro 



75 




80 





Leu 

Tyr 

Gly 

Asn 

Glu 

Gly Leu 

Gly 

Trp 

Ala 

Gly 

Trp 

Leu 

Leu 

85 




90 




95 




Ser 

Pro 

Arg 

Gly 

Ser 

Arg Pro 

Ser 

Trp 

Gly 

Pro 

Thr 

Asp 

Pro 


100 




105 




110 



Arg 

Arg 

Arg 

Ser 

Arg 

Asn Leu 

Gly 

Lys 

Val 

lie 

Asp 

Thr 

Leu 

115 




120 





125 


Thr 

Cys 

Gly 

Phe 

Ala 

Asp Leu 

Met 

Gly 

Tyr 

lie 

Pro 

Leu 

Val 



130 




135 





140 

Gly 

Ala 

Pro 

Leu 

Gly 

Gly Ala 

Ala 

Arg 

Ala 

Leu 

Ala 

His 

Gly 




145 




150 





Val 

Arg 

Val 

Leu 

Glu 

Asp Gly 

Val 

Asn 

Tyr 

Ala 

Thr 

Gly 

Asn 

155 




160 




165 




Leu 

Pro 

Gly 

Cys 

Ser 

Phe Ser 

lie 

Phe 

Leu 

Leu 

Ala 

Leu 

Leu 


170 


175 





180 



Ser 

Cys 

Leu 

Thr 

Val 

Pro Ala 

Ser 

Ala 







185 




190 








(2) INFORMATION FOR SEQ ID NO: 171: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 


35 


(vi) ORIGINAL SOURCE: 
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(A) ORGANISM: horaosapiens 

(C) INDIVIDUAL ISOLATE: S9 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 171: 


Met 

Ser 

Thr 

Asn 

Pro 

Lys 

Pro 

Gin 

Arg 

Lys 

Thr 

Lys 

Arg 

Asn 

1 




5 





10 





Thr 

Asn 

Arg 

Arg 

Pro 

Gin 

Asp 

Val 

Lys 

Phe 

Pro 

Gly 

Gly 

Gly 

15 





20 





25 




Gin 

He 

Val 

Gly 

Gly 

Val 

Tyr 

Leu 

Leu 

Pro 

Arg 

Arg 

Gly 

Pro 


30 



35 





40 



Arg 

Leu 

Gly Val 

Arg 

Ala 

Thr 

Arg 

Lys 

Thr 

Ser 

Glu 

Arg 

Ser 


45 





50 





55 

His 

Gin 

Pro 

Arg 

Gly 

Arg 

Arg 

Gin 

Pro 

lie 

Pro 

Lys 

Ala 

Arg 



60 




65 





70 

Pro 

Glu 

Gly Arg 

Ala 

Trp 

Ala 

Gin 

Pro 

Gly 

Tyr 

Pro 

Trp 

Pro 





75 





80 





Leu 

Tyr 

Gly 

Asn 

Glu 

Gly 

Leu 

Gly 

Trp 

Ala 

Gly 

Trp 

Leu 

Leu 

85 




90 





95 




Ser 

Pro 

Arg 

Gly 

Ser 

Arg 

Pro 

Ser 

Trp 

Gly 

Pro 

Asn 

Asp 

Pro 


100 



105 





110 



Arg 

Arg 

Arg 

Ser 

Arg 

Asn 

Leu 

Gly 

Lys 

Val 

lie 

Asp 

Thr 

Leu 

115 





120 





125 

Val 

Thr 

Cys 

Gly 

Phe 

Ala 

Asp 

Leu 

Met 

Gly 

Tyr 

lie 

Pro 

Leu 


130 





135 





140 

Gly 

Ala 

Pro 

Leu 

Gly 

Gly 

Ala 

Ala 

Arg 

Ala 

Leu 

Ala 

His 

Gly 




145 





150 





Val 

Arg 

Val 

Leu 

Glu 

Asp 

Gly 

Val 

Asn 

Tyr 

Ala 

Thr 

Gly 

Asn 

155 




160 





165 




Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

lie 

Phe 

Leu 

Leu 

Ala 

Leu 

Leu 


170 



175 




180 



Ser 

Cys 

Leu 

Thr 

lie 

Pro 

Ala 

Ser 

Ala 







185 





190 








25 


30 


(2) INFORMATION FOR SEQ ID NO: 172: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: HK3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 172: 

Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn 
15 10 


35 


372577_ 
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° 

Thr 

Asn 

Arg 

Arg 

Pro 

Gin 


15 

Gin 

He 

Val 

Gly 

Gly 

20 

Val 


Arg 

30 

Leu 

Gly 

Val 

Arg 

Ala 


Gin 

Pro 

45 

Arg 

Gly 

Arg 

Arg 

5 

Pro 

Glu 

Gly 

60 

Arg 

Thr 

Trp 


Leu 

Tyr 

Gly 

Asn 

75 

Glu 

Gly 


85 

Ser 

Pro 

Arg 

Gly 

Ser 

90 

Arg 


Arg 

100 

Arg 

Arg 

Ser 

Arg 

Asn 

10 

Thr 

Cys 

115 

Gly 

Phe 

Ala 

Asp 


Gly 

Ala 

Pro 

130 

Leu 

Gly 

Gly 


Val 

Arg 

Val 

Leu 

145 

Glu 

Asp 

15 

155 

Leu 

Pro 

Gly 

Cys 

Ser 

160 

Phe 

Ser 

170 

Cys 

Leu 

Thr 

Thr 

Pro 


185 


Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Gin 

70 

Ala 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Met 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Pro 

105 

Asn 

Trp 

Gly 

Pro 

Thr 

110 

Asp 

Pro 

Leu 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Leu 

Val 

140 

Val 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 

Gly 

Val 

Asn 

Tyr 

Ala 

165 

Thr 

Gly 

Asn 

Ser 

175 

Ala 

lie 

Ser 

190 

Phe 

Ala 

Leu 

Leu 

Ala 

180 

Leu 

Leu 


20 


25 


30 


(2) INFORMATION FOR SEQ ID NO: 173: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 


(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: HK5 




(xi) 


SEQUENCE 

1 DESCRIPTION 

[: SEQ ID NO: 173 : 

Met 

1 

Ser 

Thr 

Asn 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

Asn 

Arg 

Arg 

Pro 

Gin 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

lie 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Arg 

70 


35 
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Pro 

Glu 

Gly 

Arg 

Thr 

75 

Trp 

Ala 

Gin 

Pro 

Leu 

85 

Tyr 

Gly 

Asn 

Glu 

Gly 

90 

Met 

Gly Trp 

Ser 

Pro 

100 

His 

Gly 

Ser 

Arg 

Pro 

105 

Ser 

Trp 

Arg 

Arg 

Arg 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Lys 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Met 

Gly 

135 

Gly 

Ala 

Pro 

Leu 

Gly 

145 

Gly 

Val 

Ala 

Arg 

Val 

155 

Arg 

Val 

Leu 

Glu 

Asp 

160 

Gly 

Val 

Asn 

lie 

Pro 

170 

Gly 

Cys 

Ser 

Phe 

Ser 

175 

lie 

Phe 

Ser 

Cys 

Leu 

185 

Thr 

Thr 

Pro 

Val 

Ser 

190 

Ala 


Gly Tyr 
80 

Ala Gly 
95 

Gly Pro 

Val He 

Tyr lie 

Ala Leu 
150 

Tyr Ala 
165 
Leu Leu 


Pro Trp Pro 

Trp Leu Leu 

Thr Asp Pro 
110 

Asp Thr Leu 
125 

Pro Leu Val 
140 

Ala His Gly 

Thr Gly Asn 

Ala Leu Leu 
180 


25 


INFORMATION FOR SEQ ID NO: 174: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: HK4 


(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 174: 


Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Gin 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Gin 

70 

Trp 

Ala 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Gly 

90 

Met 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Arg 

Pro 

105 

Ser 

Trp 

Gly 

Pro 

Thr 

110 

Asp 

Pro 

Asn 

Leu 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 


35 
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Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Gly 

Ala 

Pro 

Leu 

Gly 

145 

Gly 

Val 

Val 

155 

Arg 

Val 

Val 

Glu 

Asp 

160 

Gly 

Leu 

Pro 

170 

Gly 

Cys 

Ser 

Phe 

Ser 

175 

Ser 

Cys 

Leu 

185 

Thr 

He 

Pro 

Ala 


Met 

Gly 

135 

Tyr 

lie 

Pro 

Leu 

Val 

140 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 

Val 

Asn 

Tyr 

Ala 

165 

Thr 

Gly 

Asn 

lie 

Ser 

190 

Phe 

Ala 

Leu 

Leu 

Ala 

180 

Leu 

Leu 


10 


15 


20 


25 


30 


35 


(2) INFORMATION FOR SEQ ID NO: 175: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: P8 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 175: 


Met 

1 

Ser 

Thr 

Thr 

Pro 

5 

Lys 

Pro 

Thr 

15 

Ser 

Arg 

Arg 

Pro 

Gin 

20 

Asp 

Gin 

lie 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Gin 

Pro 

Arg 

Gly Arg 
60 

Arg 

Gin 

Pro 

Glu 

Gly 

Arg 

Ala 

75 

Trp 

Ala 

Leu 

85 

Tyr 

Ala 

Asn 

Glu 

Gly 

90 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Arg 

Arg 

Arg 

115 

Ser 

Arg 

Asn 

Leu 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Gly 

Gly 

Pro 

Leu 

Gly 

145 

Gly Val 

Val 

155 

Arg 

Val 

Val 

Glu 

Asp 

160 

Gly 

Leu 

Pro 

170 

Gly 

Cys 

Ser 

Phe 

Ser 

175 


Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Arg 

70 

Gin 

Pro 

Gly 

80 

His 

Pro 

Trp 

Pro 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Trp 

Gly 

Pro 

Thr 

110 

Asp 

Pro 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Leu 

Val 

140 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 

Val 

Asn 

Tyr 

Ala 

165 

Thr 

Gly 

Asn 

lie 

Phe 

Leu 

Leu 

Ala 

180 

Leu 

Leu 
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Ser Cys Leu Thr lie Pro Ala Ser Ala 
185 190 


5 


10 


15 


20 


25 


30 


(2) INFORMATION FOR SEQ ID NO : 176: 


(i) SEQUENCE CHARACTERISTICS: 






(A) 

LENGTH : 

191 amino 

acids 







(B) 

TYPE: 

amino acid 









(C) 

STRANDEDNESS : unknown 








(D) 

TOPOLOGY : 

unknown 






( 

vi) 


ORIGINAL 

SOURCE : 










(A) 

ORGANISM: 

homosapiens 








(C) 

INDIVIDUAL 

ISOLATE : 

T3 





( 

xi) 


SEQUENCE 

DESCRIPTION: SEQ ID NC 

>: 176: 

Met 

Ser 

Thr 

Asn 

Pro 

Lys 

Pro 

Gin 

Arg Lys 

Thr 

Lys 

Arg 

Asn 

1 




5 




10 





Thr 

Asn 

Arg 

Arg 

Pro 

Gin 

Asp 

Val 

Lys Phe 

Pro 

Gly 

Gly 

Gly 

15 





20 




25 




Gin 

He 

Val 

Gly 

Gly 

Val 

Tyr 

Leu 

Leu Pro 

Arg 

Arg 

Gly 

Pro 


30 





35 




40 



Arg 

Leu 

Gly 

Val 

Arg 

Ala 

Thr 

Arg 

Lys Thr 

Ser 

Glu 

Arg 

Ser 



45 





50 




55 


Gin 

Pro 

Arg 

Gly 

Arg 

Arg 

Gin 

Pro 

lie Pro 

Lys 

Ala 

Arg 

Arg 




60 





65 




70 

Pro 

Glu 

Gly 

Arg 

Ala 

Trp 

Ala 

Gin 

Pro Gly 

Tyr 

Pro 

Trp 

Pro 





75 




80 





Leu 

Tyr 

Gly 

Asp 

Glu 

Gly 

Met 

Gly 

Trp Ala 

Gly 

Trp 

Leu 

Leu 

85 





90 




95 




Ser 

Pro 

Arg 

Gly 

Ser 

Arg 

Pro 

Asn 

Trp Gly 

Pro 

Thr 

Asp 

Pro 


100 





105 




110 



Arg 

Arg 

Arg 

Ser 

Arg 

Asn 

Leu 

Gly 

Lys Val 

lie 

Asp 

Thr 

Leu 



115 





120 




125 


Thr 

Cys 

Gly 

Phe 

Ala 

Asp 

Leu 

Met 

Gly Tyr 

lie 

Pro 

Leu 

Val 




130 





135 




140 

Gly 

Ala 

Pro 

Leu 

Gly 

Gly Val 

Ala 

Arg Ala 

Leu 

Ala 

His 

Gly 





145 




150 





Val 

Arg 

Val 

Leu 

Glu 

Asp 

Gly 

Val 

Asn Tyr 

Ala 

Thr 

Gly Asn 

155 





160 




165 




Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

lie 

Phe Leu 

Leu 

Ala 

Leu 

Leu 


170 





175 




180 



Ser 

Cys 

Leu 

Thr 

lie 

Pro 

Ala 

Ser 

Ala 
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(2) INFORMATION FOR SEQ ID NO: 177: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 


3725771 



202 


5 


10 


15 


20 


25 


30 


(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

<vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T4 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 177: 

Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn 
1 5 10 

Thr Asn Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly 
15 20 25 

Gin He Val Gly Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro 
30 35 40 

Arcf Leu Gly Val Arg Ala Thr Arg Lys Thr Ser Glu Arg Ser 
45 50 55 

Gin Pro Arg Gly Arg Arg Gin Pro lie Pro Lys Asp Arg Arg 
60 65 70 

Ser Thr Gly Lys Ser Trp Gly Lys Pro Gly Tyr Pro Trp Pro 
75 80 

Leu Tyr Gly Asn Glu Gly Leu Gly Trp Ala Gly Trp Leu Leu 
85 90 95 

Ser Pro Arg Gly Ser Arg Pro Ser Trp Gly Pro Asn Asp Pro 
100 105 HO 

Arcr His Arq Ser Arg Asn Val Gly Lys Val lie Asp Thr Leu 
115 120 125 

Thr Cys Ser Leu Ala Asp Leu Met Gly Tyr Val Pro Val Val 
130 135 _ 140 

Gly Gly Pro Leu Gly Gly Val Ala Arg Ala Leu Ala His Gly 
145 150 

Val Arg Val Leu Glu Asp Gly Val Asn Tyr Ala Thr Gly Asn 
155 160 165 

Leu Pro Gly Cys Ser Phe Ser lie Phe Leu Leu Ala Leu Leu 
170 175 180 

Ser Cys lie Thr lie Pro Val Ser Ala 

185 190 

(2) INFORMATION FOR SEQ ID NO: 178: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: US10 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 178: 


35 


372577 _1 



203 


Met 

Ser 

Thr 

Asn 

Pro 

Lys 

Pro 

Gin 

Arg 

Lys 

Thr 

Lys 

Arg 

Asn 

1 




5 





10 





Thr 

Asn 

Arg 

Arg 

Pro 

Gin 

Asp 

Val 

Lys 

Phe 

Pro 

Gly Gly Gly 

15 




20 





25 




Gin 

He 

Val 

Gly 

Gly 

Val 

Tyr 

Leu 

Leu 

Pro 

Arg 

Arg 

Gly 

Pro 


30 




35 





40 



Arg 

Leu 

Gly 

Val 

Arg 

Ala 

Thr 

Arg 

Lys 

Thr 

Ser 

Glu 

Arg 

Ser 


45 





50 





55 


Gin 

Pro 

Arg 

Gly 

Arg 

Arg 

Gin 

Pro 

lie 

Pro 

Lys 

Asp 

Arg 

Arg 



60 





65 





70 

Pro 

Thr 

Gly 

Lys 

Ser 

Trp 

Gly 

Lys 

Pro 

Gly 

Tyr 

Pro 

Trp 

Pro 



75 





80 





Leu 

Tyr 

Gly 

Asn 

Glu 

Gly 

Leu 

Gly 

Trp 

Ala 

Gly 

Trp 

Leu 

Leu 

85 



90 





95 




Ser 

Pro 

Arg 

Gly 

Ser 

Arg 

Pro 

Ser 

Trp 

Gly 

Pro 

Thr 

Asp 

Pro 


100 




105 





110 



Arg 

His 

Arg 

Ser 

Arg 

Asn 

Val 

Gly 

Lys 

Val 

lie 

Asp 

Thr 

Leu 


115 





120 





125 


Thr 

Cys 

Gly 

Phe 

Ala 

Asp 

Leu 

Met 

Gly 

Tyr 

lie 

Pro 

Val 

Val 


130 





135 





140 

Gly Ala 

Pro 

Leu 

Gly 

Gly 

Val 

Ala 

Arg 

Ala 

Leu 

Ala 

His 

Gly 





145 





150 





Val 

Arg 

Val 

Leu 

Glu 

Asp 

Gly 

Val 

Asn 

Tyr 

Ala 

Thr 

Gly Asn 

155 




160 





165 




Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

lie 

Phe 

Leu 

Leu 

Ala 

Leu 

Leu 


170 



175 





180 



Ser 

Cys 

lie 

Thr 

lie 

Pro 

Val 

Ser 

Ala 







185 





190 








20 


25 


30 


35 


(2) INFORMATION FOR SEQ ID NO : 179: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T9 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 179: 

Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr lie Arg Asn 
15 10 

Thr Asn Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly 
15 20 25 

Gin lie Val Gly Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro 
30 35 40 

Arg Leu Gly Val Arg Thr Thr Arg Lys Thr Ser Glu Arg Ser 
45 50 55 


372577J 



204 


Gin 

Pro 

Arg 

Gly 

Arg 

Arg 

Gin 

Pro 

lie 

Pro 

Lys 

Asp 

Arg 

Arg 



60 





65 





70 

Ser 

Thr 

Gly 

Lys 

Ser 

Trp 

Gly 

Lys 

Pro 

Gly 

Tyr 

Pro 

Trp 

Pro 



75 





80 





Leu 

Tyr 

Gly 

Asn 

Glu 

Gly 

Leu 

Gly 

Trp 

Ala 

Gly 

Trp 

Leu 

Leu 

85 



90 





95 




Ser 

Pro 

Arg 

Gly 

Ser 

Arg 

Pro 

Ser 

Trp 

Gly 

Pro 

Ser 

Asp 

Pro 


100 




105 





110 



Arg 

His 

Arg 

Ser 

Arg 

Asn 

Val 

Gly 

Lys 

Val 

lie 

Asp 

Thr 

Leu 


115 





120 





125 


Thr 

Cys 

Gly 

Phe 

Ala 

Asp 

Leu 

Met 

Gly 

Tyr 

lie 

Pro 

Val 

Val 


130 





135 





140 

Gly 

Ala 

Pro 

Leu 

Gly 

Gly 

Val 

Ala 

Arg 

Ala 

Leu 

Ala 

His 

Gly 




145 





150 





Val 

Arg 

Val 

Leu 

Glu 

Asp 

Gly 

Val 

Asn 

Tyr 

Ala 

Thr 

Gly 

Asn 

155 




160 





165 




Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

lie 

Phe 

Leu 

Leu 

Ala 

Leu 

Leu 


170 



175 





180 



Ser 

Cys 

He 

Thr 

Thr 

Pro 

Ala 

Ser 

Ala 







185 





190 








15 ( 2 ) INFORMATION FOR SEQ ID NO : 180: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

20 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C> INDIVIDUAL ISOLATE: T2 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 180: 


Met 

1 

Ser 

Thr 

lie 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

Asn 

Arg 

Arg 

Pro 

Gin 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

lie 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Asp 

Arg 

Arg 

70 

Ser 

Thr 

Gly 

Lys 

Ser 

75 

Trp 

Gly 

Lys 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Leu 

85 

Tyr 

Gly 

Asn 

Glu 

Gly 

90 

Leu 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Ser 

Trp 

Gly 

Pro 

Asn 

110 

Asp 

Pro 


372577_1 
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Arg 

His 

Arg 

Ser 

Arg 

Asn 

Val 

Gly 

Lys 

Val 

lie 

Asp 

Thr 

Leu 


115 





120 





125 


Thr 

Cys 

Gly 

Phe 

Ala 

Asp 

Leu 

Met 

Gly 

Tyr 

lie 

Pro 

Val 

Val 


130 





135 





14 0 

Gly 

Ala 

Pro 

Leu 

Gly 

Gly 

Val 

Ala 

Arg 

Ala 

Leu 

Ala 

His 

Gly 




145 





150 





Val 

Arg 

Val 

Leu 

Glu 

Asp 

Gly 

Val 

Asn 

Tyr 

Ala 

Thr 

Gly 

Asn 

155 




160 





165 




Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

lie 

Phe 

Leu 

Leu 

Ala 

Leu 

Leu 


170 



175 





180 



Ser 

Cys 

lie 

Thr 

He 

Pro 

Val 

Ser 

Ala 







185 





190 








10 


15 


20 


25 


30 


35 


(2) INFORMATION FOR SEQ ID NO: 181: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: T8 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 181: 


Met 

1 

Ser 

Thr 

Asn 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

Asn 

Arg 

Arg 

Pro 

Gin 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

lie 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Asp 

Arg 

Arg 

70 

Ser 

Thr 

Gly 

Lys 

Ser 

75 

Trp 

Gly 

Lys 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Leu 

85 

Tyr 

Gly 

Asn 

Glu 

Gly 

90 

Cys 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Thr 

Trp 

Gly 

Pro 

Thr 

110 

Asp 

Pro 

Arg 

His 

Arg 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Arg 

Val 

lie 

Asp 

Thr 

125 

lie 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Val 

Val 

140 

Gly Ala 

Pro 

Val 

Gly 

145 

Gly 

Val 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 

Val 

155 

Arg 

Val 

Leu 

Glu 

Asp 

160 

Gly 

lie 

Asn 

Tyr 

Ala 

165 

Thr 

Gly Asn 


372577_ 
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Leu Pro Gly Cys Ser Phe Ser lie Phe Leu Leu Ala Leu Leu 
170 175 180 

Ser Cys Phe Thr Val Pro Val Ser Ala 
185 190 


(2) INFORMATION FOR SEQ ID NO: 182: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: US1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 182: 



Met 

Ser 

Thr 

Asn 

Pro 

Lys 

Pro 

Gin 

Arg 

Lys 

Thr 

Lys 

Arg 

Asn 


1 




5 





10 





15 

Thr 

Asn 

Arg 

Arg 

Pro 

Gin 

Asp 

Val 

Lys 

Phe 

Pro 

Gly 

Gly 

Gly 


15 




20 





25 





Gin 

lie 

Val 

Gly 

Gly 

Val 

Tyr 

Leu 

Leu 

Pro 

Arg 

Arg 

Gly 

Pro 



30 




35 





40 




Arg 

Leu 

Gly 

Val 

Arg 

Ala 

Thr 

Arg 

Lys 

Thr 

Ser 

Glu 

Arg 

Ser 



45 





50 





55 



Gin 

Pro 

Arg 

Gly Arg 

Arg 

Gin 

Pro 

lie 

Pro 

Lys 

Asp 

Arg 

Arg 

20 



60 





65 





70 

Ser 

Thr 

Gly 

Lys 

Ser 

Trp 

Gly 

Lys 

Pro 

Gly 

Tyr 

Pro 

Trp 

Pro 





75 





80 






Leu 

Tyr 

Gly 

Asn 

Glu 

Gly 

Cys 

Gly 

Trp 

Ala 

Gly 

Trp 

Leu 

Leu 


85 




90 





95 





Ser 

Pro 

Arg 

Gly 

Ser 

Arg 

Pro 

Thr 

Trp 

Gly 

Pro 

Thr 

Asp 

Pro 



100 





105 





110 




Arg 

His 

Arg 

Ser 

Arg 

Asn 

Leu 

Gly 

Lys 

Val 

lie 

Asp 

Thr 

lie 

25 


115 





120 





125 



Thr 

Cys 

Gly 

Phe 

Ala 

Asp 

Leu 

Met 

Gly 

Tyr 

lie 

Pro 

Val 

Val 




130 





135 





140 


Gly 

Ala 

Pro 

Val 

Gly 

Gly 

Val 

Ala 

Arg 

Ala 

Leu 

Ala 

His 

Gly 





145 





150 






Val 

Arg 

Val 

Leu 

Glu 

Asp 

Gly 

He 

Asn 

Tyr 

Ala 

Thr 

Gly 

Asn 


155 




160 





165 




30 

Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

lie 

Phe 

Leu 

Leu 

Ala 

Leu 

Leu 


170 



175 





180 




Ser 

Cys 

Ala 

Thr 

Val 

Pro 

Val 

Ser 

Ala 








185 





190 








(2) INFORMATION FOR SEQ ID NO: 183: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK11 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 183: 


Met Ser Thr Asn Pro Lys 
1 5 

10 Thr Asn Arg Arg Pro Gin 
15 20 

Gin He Val Gly Gly Val 
30 

Arg Leu Gly Val Arg Thr 
45 

Gin Pro Arg Gly Arg Arg 
60 

15 Ser Thr Gly Lys Pro Trp 
75 

Leu Tyr Gly Asn Glu Gly 
85 90 

Ser Pro Arg Gly Ser His 
100 

Arg His Lys Ser Arg Asn 

20 115 

Thr Cys Gly Phe Ala Asp 
130 

Gly Ala Pro Val Gly Gly 
145 

Val Arg Val Leu Glu Asp 
155 160 

Leu Pro Gly Cys Ser Phe 
25 170 

Ser Cys Cys Thr Val Pro 
185 


Pro Gin Arg Lys Thr Lys Arg Asn 
10 

Asp Val Lys Phe Pro Gly Gly Gly 
25 

Tyr Leu Leu Pro Arg Arg Gly Pro 
35 40 

Thr Arg Lys Thr Ser Glu Arg Ser 
50 55 

Gin Pro lie Pro Lys Asp Arg Arg 
65 70 

Gly Lys Pro Gly Tyr Pro Trp Pro 
80 

Cys Gly Trp Ala Gly Trp Leu Leu 
95 

Pro Asn Trp Gly Pro Thr Asp Pro 
105 110 

Leu Gly Lys Val lie Asp Thr lie 
120 125 

Leu Met Gly Tyr lie Pro Val Val 
135 140 

Val Ala Arg Ala Leu Ala His Gly 
150 

Gly lie Asn Tyr Ala Thr Gly Asn 
165 

Ser lie Phe Leu Leu Ala Leu Leu 
175 180 

Val Ser Ala 
190 


30 


35 


(2) INFORMATION FOR SEQ ID NO: 184: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 


372577_ 
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5 


10 


15 


20 


25 


30 


35 


(C) INDIVIDUAL ISOLATE: SW3 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 184: 


Met 

Ser 

Thr 

Asn 

Pro 

Lys 

Pro 

Gin 

Arg 

Lys 

Thr 

Lys 

Arg 

Asn 

1 




5 





10 





Thr 

Asn 

Arg 

Arg 

Pro 

Gin 

Asp 

Val 

Lys 

Phe 

Pro 

Gly 

Gly 

Gly 

15 





20 





25 




Gin 

He 

Val 

Gly 

Gly 

Val 

Tyr 

Leu 

Leu 

Pro 

Arg 

Arg 

Gly 

Pro 


30 





35 





40 



Arg 

Leu 

Gly 

Val 

Arg 

Ala 

Thr 

Arg 

Lys 

Thr 

Ser 

Glu 

Arg 

Ser 



45 





50 





55 


Gin 

Pro 

Arg 

Gly 

Arg 

Arg 

Gin 

Pro 

lie 

Pro 

Lys 

Asp 

Arg 

Arg 




60 





65 





70 

Ser 

Thr 

Gly 

Lys 

Ser 

Trp 

Gly 

Lys 

Pro 

Gly 

Tyr 

Pro 

Trp 

Pro 





75 





80 





Leu 

Tyr 

Gly 

Asn 

Glu 

Gly 

Cys 

Gly 

Trp 

Ala 

Gly 

Trp 

Leu 

Leu 

85 





90 





95 




Ser 

Pro 

Arg 

Gly 

Ser 

His 

Pro 

Asn 

Trp 

Gly 

Pro 

Thr 

Asp 

Pro 


100 





105 





110 



Arg 

His 

Arg 

Ser 

Arg 

Asn 

Leu 

Gly 

Lys 

Val 

lie 

Asp 

Thr 

lie 



115 





120 





125 


Thr 

Cys 

Gly 

Phe 

Ala 

Asp 

Leu 

Met 

Gly 

Tyr 

lie 

Pro 

Val 

Val 




130 





135 





140 

Gly 

Ala 

Pro 

Val 

Gly 

Gly 

Val 

Ala 

Arg 

Ala 

Leu 

Ala 

His 

Gly 





145 





150 





Val 

Arg 

Val 

Leu 

Glu 

Asp 

Gly 

lie 

Asn 

Tyr 

Ala 

Thr 

Gly 

Asn 

155 





160 





165 




Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

lie 

Phe 

Leu 

Leu 

Ala 

Leu 

Leu 


170 





175 





180 



Ser 

Cys 

Phe 

Thr 

Val 

Pro 

Val 

Ser 

Ala 







185 190 


(2) INFORMATION FOR SEQ ID NO: 185: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

{ B ) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK8 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 185: 

Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn 
15 10 

Thr Asn Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly 
15 20 25 


372577_ 
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Gin 

He 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Ser 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Asp 

Arg 

Arg 

70 

Ser 

Thr 

Gly 

Lys 

Ser 

75 

Trp 

Gly 

Lys 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Leu 

85 

Tyr 

Gly 

Asn 

Glu 

Gly 

90 

Cys 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Thr 

Trp 

Gly 

Pro 

Thr 

110 

Asp 

Pro 

Arg 

His 

Arg 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

lie 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Val 

Val 

140 

Gly 

Ala 

Pro 

Val 

Gly 

145 

Gly 

Val 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 

Val 

155 

Arg 

Val 

Leu 

Glu 

Asp 

160 

Gly 

lie 

Asn 

Tyr 

Ala 

165 

Thr 

Gly 

Asn 

Leu 

Ser 

Pro 

170 

Cys 

Gly 

Cys 

185 

Cys 

Thr 

Ser 

Val 

Phe 

Pro 

Ser 

175 

Val 

lie 

Ser 

190 

Phe 

Ala 

Leu 

Leu 

Ala 

180 

Leu 

Leu 


INFORMATION FOR SEQ ID NO: 
(i) 


186 : 


(xi) 


SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S83 

SEQUENCE DESCRIPTION: SEQ ID NO: 186: 


Met 

1 

Ser 

Thr 

Asn 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

Asn 

Arg 

Arg 

Pro 

Gin 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

lie 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly Val 
45 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Asp 

Arg 

Arg 

70 

Thr 

Thr 

Gly 

Lys 

Ser 

75 

Trp 

Gly 

Arg 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 
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Leu 

85 

Tyr 

Gly 

Asn 

Glu 

Gly 

90 

Leu 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Ser 

Trp 

Gly 

Pro 

Thr 

110 

Asp 

Pro 

Arg 

His 

Lys 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Val 

Val 

140 

Gly 

Ala 

Pro 

Val 

Gly 

145 

Gly 

Val 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 

Val 

155 

Arg 

Val 

Leu 

Glu 

Asp 

160 

Gly 

lie 

Asn 

Tyr 

Ala 

165 

Thr 

Gly 

Asn 

Leu 

Ser 

Pro 

170 

Cys 

Gly 

He 

185 

Cys 

Ser 

Ser 

Val 

Phe 

Pro 

Ser 

175 

Val 

lie 

Ser 

190 

Phe 

Ala 

Leu 

Leu 

Ala 

180 

Leu 

Leu 


15 


20 


25 


30 


35 


(2) INFORMATION FOR SEQ ID NO: 187: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: horaosapiens 




(xi) 


(C) INDIVIDUAL ISOLATE: HK10 

SEQUENCE DESCRIPTION: SEQ ID NO: 187: 

Met 

1 

Ser 

Thr 

Leu 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

lie 

Arg 

Arg 

Pro 

Gin 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

lie 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Val 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly Arg 
60 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Arg 

70 

Ser 

Glu 

Gly 

Arg 

Ser 

75 

Trp 

Ala 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Leu 

85 

Tyr Gly 

Asn 

Glu 

Gly 

90 

Cys 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Ser 

Trp 

Gly 

Pro 

Asn 

110 

Asp 

Pro 

Arg 

Arg 

Arg 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Leu 

Val 

140 
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Gly 

Ala 

Pro 

Val 

Gly 

Gly 

Val 

Ala 

Arg 

Ala 

Leu 

Ala 

His 

Gly 





145 





150 





Val 

Arg 

Ala 

Leu 

Glu 

Asp 

Gly 

lie 

Asn 

Phe 

Ala 

Thr 

Gly 

Asn 

155 





160 





165 




Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

lie 

Phe 

Leu 

Leu 

Ala 

Leu 

Phe 


170 




175 





180 



Ser 

Cys 

Leu 

He 

His 

Pro 

Ala 

Ala 

Ser 







185 





190 








10 


15 


20 


25 
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(2) INFORMATION FOR SEQ ID NO: 188: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S52 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 188: 


Met 

1 

Ser 

Thr 

Leu 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

lie 

Arg 

Arg 

Pro 

Gin 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly Gly 

Gin 

lie 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Val 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly Val 
45 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Arg 

70 

Ser 

Glu 

Gly 

Arg 

Ser 

75 

Trp 

Ala 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Leu 

85 

Tyr 

Gly Asn 

Glu 

Gly 

90 

Cys 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Ser 

Trp 

Gly 

Pro 

Asn 

110 

Asp 

Pro 

Arg 

Arg 

Arg 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Leu 

Val 

140 

Gly 

Ala 

Pro 

Val 

Gly 

145 

Gly 

Val 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 

Val 

155 

Arg 

Ala 

Leu 

Glu 

Asp 

160 

Gly 

lie 

Asn 

Phe 

Ala 

165 

Thr 

Gly 

Asn 

Leu 

Ser 

Pro 

170 

Cys 

Gly 

Leu 

185 

Cys 

Val 

Ser 

His 

Phe 

Pro 

Ser 

175 

Ala 

lie 

Ala 

190 

Phe 

Ser 

Leu 

Leu 

Ala 

180 

Leu 

Phe 
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(2) INFORMATION FOR SEQ ID NO: 189: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: S2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 189: 


Met 

1 

Ser 

Thr 

Leu 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

He 

Arg 

Arg 

Pro 

Gin 

20 

Asp 

lie 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

lie 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Val 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Arg 

70 

Ser 

Glu 

Gly 

Arg 

Ser 

75 

Trp 

Ala 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Leu 

85 

Tyr 

Gly 

Asn 

Glu 

Gly 

90 

Cys 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Ser 

Trp 

Gly 

Pro 

Asn 

110 

Asp 

Pro 

Arg 

Arg 

Arg 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Leu 

Val 

140 

Gly Ala 

Pro 

Val 

Gly 

145 

Gly 

Val 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 

Val 

155 

Arg 

Ala 

Leu 

Glu 

Asp 

160 

Gly 

lie 

Asn 

Phe 

Ala 

165 

Thr 

Gly 

Asn 

Leu 

Ser 

Pro 

170 

Cys 

Gly 

Leu 

185 

Cys 

lie 

Ser 

His 

Phe 

Pro 

Ser 

175 

Ala 

lie 

Ala 

190 

Phe 

Ser 

Leu 

Leu 

Ala 

180 

Leu 

Phe 


(2) INFORMATION FOR SEQ ID NO : 190: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 
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(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK12 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 190: 


Met 

1 

Ser 

Thr 

Leu 

Pro 

5 

Lys 

Pro 

Thr 

15 

He 

Arg 

Arg 

Pro 

Gin 

20 

Asp 

Gin 

lie 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Ser 

Glu 

Gly 

Arg 

Ser 

75 

Trp 

Ala 

Leu 

85 

Tyr 

Gly 

Asn 

Glu 

Gly 

90 

Cys 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Arg 

Arg 

Arg 

115 

Ser 

Arg 

Asn 

Leu 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Gly 

Ala 

Pro 

Val 

Gly 

145 

Gly 

Val 

Val 

155 

Arg 

Ala 

Leu 

Glu 

Asp 

160 

Gly 

Leu 

Pro 

170 

Gly 

Cys 

Ser 

Phe 

Ser 

175 

Ser 

Cys 

Leu 

185 

lie 

His 

Pro 

Ala 


Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Val 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Arg 

70 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Trp 

Gly 

Pro 

Asn 

110 

Asp 

Pro 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Leu 

Val 

140 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 

lie 

Asn 

Phe 

Ala 

165 

Thr 

Gly 

Asn 

lie 

Ala 

190 

Phe 

Ser 

Leu 

Leu 

Ala 

180 

Leu 

Phe 


(2) INFORMATION FOR SEQ ID NO: 191: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: Z4 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 191: 
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Met 

1 

Ser 

Thr 

Asn 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

Asn 

Arg 

Arg 

Pro 

Met 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly Gly 

Gin 

He 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Gin 

70 

Pro 

Glu 

Gly 

Arg 

Ser 

75 

Trp 

Ala 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Leu 

85 

Tyr 

Gly 

Asn 

Glu 

Gly 

90 

Cys 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Ser 

Trp 

Gly 

Pro 

Asn 

110 

Asp 

Pro 

Arg 

Arg 

Arg 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

lie 

Val 

140 

Gly 

Ala 

Pro 

Val 

Gly 

145 

Gly 

Val 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 

Val 

155 

Arg 

Ala 

Val 

Glu 

Asp 

160 

Gly 

lie 

Asn 

Tyr 

Ala 

165 

Thr 

Gly 

Asn 

Leu 

Ser 

Pro 

170 

Cys 

Gly 

Leu 

185 

Cys 

Thr 

Ser 

Val 

Phe 

Pro 

Ser 

175 

Ala 

lie 

Ser 

190 

Phe 

Ala 

Leu 

Leu 

Ala 

180 

Leu 

Leu 


(2) INFORMATION FOR SEQ ID NO: 192: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: Z8 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 192: 


Met 

1 

Ser 

Thr 

Asn 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

Asn 

Arg 

Arg 

Pro 

Met 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

lie 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 


35 
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Gin Pro Arg Gly Arg Arg Gin 
60 

Ser Glu Gly Arg Ser Trp Ala 
75 

Leu Tyr Gly Asn Glu Gly Cys 
85 90 

Ser Pro Arg Gly Ser Arg Pro 

- 100 105 

3 Arg Arg Arg Ser Arg Asn Leu 

115 

Thr Cys Gly Phe Ala Asp Leu 
130 

Gly Ala Pro Val Gly Gly Val 
145 

Val Arg Ala Val Glu Asp Gly 
10 155 160 

Leu Pro Gly Cys Ser Phe Ser 

170 175 

Ser Cys Leu Thr Val Pro Ala 

185 


Pro lie Pro Lys Ala Arg Arg 
65 70 

Gin Pro Gly Tyr Pro Trp Pro 
80 

Gly Trp Ala Gly Trp Leu Leu 
95 

Ser Trp Gly Pro Asn Asp Pro 
110 

Gly Lys Val lie Asp Thr Leu 
120 125 

Met Gly Tyr lie Pro Leu Val 
135 140 

Ala Arg Ala Leu Ala His Gly 
150 

lie Asn Tyr Ala Thr Gly Asn 
165 

lie Phe Leu Leu Ala Leu Leu 
180 

Ser Ala 
190 


15 


20 


25 


30 


35 


(2) INFORMATION FOR SEQ ID NO: 193: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY : unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: Z1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 193: 

Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn 
15 10 

Thr Asn Arg Arg Pro Met Asp Val Lys Phe Pro Gly Gly Gly 
15 20 25 

Gin lie Val Gly Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro 
30 35 40 

Arg Leu Gly Val Arg Ala Ala Arg Lys Thr Ser Glu Arg Ser 
45 50 55 

Gin Pro Arg Gly Arg Arg Gin Pro lie Pro Lys Ala Arg Arg 
60 65 70 

Ser Glu Gly Arg Ser Trp Ala Gin Pro Gly Tyr Pro Trp Pro 
75 80 

Leu Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp Leu Leu 
85 90 95 

Ser Pro Arg Gly Ser Arg Pro Ser Trp Gly Pro Asn Asp Pro 
100 105 110 
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Arg 

Arg 

Arg 

Ser 

Arg 

Asn 

Leu 

Gly 

Lys 

Val 

lie 

Asp 

Thr 

Leu 

115 





120 





125 


Thr 

Cys 

Gly 

Phe 

Ala 

Asp 

Leu 

Met 

Gly 

Tyr 

lie 

Pro 

Leu 

Val 



130 





135 





140 

Gly 

Ala 

Pro 

Val 

Gly 

Gly 

Val 

Ala 

Arg 

Ala 

Leu 

Ala 

His 

Gly 




145 





150 





Val 

Arg 

Ala 

Val 

Glu 

Asp 

Gly 

lie 

Asn 

Tyr 

Ala 

Thr 

Gly 

Asn 

155 





160 





165 




Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

He 

Phe 

Leu 

Leu 

Ala 

Leu 

Leu 


170 





175 





180 



Ser 

Cys 

Leu 

Thr 

Thr 

Pro 

Ala 

Ser 

Ala 







185 





190 








(2) INFORMATION FOR SEQ ID NO: 194: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: Z5 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 194: 


Met 

1 

Ser 

Thr 

Asn 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

Asn 

Arg 

Arg 

Pro 

Met 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

lie 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Gin 

Ala 

Arg 

Arg 

70 

Ser 

Glu 

Gly 

Arg 

Ser 

75 

Trp 

Ala 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Leu 

85 

Tyr 

Gly 

Asn 

Glu 

Gly 

90 

Cys 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Ser 

Trp 

Gly 

Gin 

Asn 

110 

Asp 

Pro 

Arg 

Arg 

Arg 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Leu 

Val 

140 

Gly 

Ala 

Pro 

Val 

Gly 

145 

Gly 

Val 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 

Val 

155 

Arg 

Ala 

Leu 

Glu 

Asp 

160 

Gly 

lie 

Asn 

Tyr 

Ala 

165 

Thr 

Gly 

Asn 


372577_ 



217 


Leu Pro Gly Cys Ser Phe Ser lie Phe Leu Leu Ala Leu Phe 
170 175 180 

Ser Cys Leu Thr Thr Pro Ala Ser Ala 
185 190 
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(2) INFORMATION FOR SEQ ID NO: 195: 


(i) SEQUENCE CHARACTERISTICS: 






(A) 

LENGTH: 191 amino 

acids 







(B) 

TYPE: amino acid 









(C) 

STRANDEDNESS : unknown 








(D) 

TOPOLOGY : unknown 






(vi) 


ORIGINAL SOURCE: 









(A) 

ORGANISM: homosapiens 








(C) 

INDIVIDUAL ISOLATE: 

Z6 





(xi) 


SEQUENCE DESCRIPTION: SEQ ID NC 

> : 195: 

Met 

Ser 

Thr 

Asn 

Pro 

Lys Pro Gin Arg Lys 

Thr 

Lys 

Arg 

Asn 

1 




5 

10 




Gly 

Thr 

Asn 

Arg 

Arg 

Pro 

Met Asp Val Lys Phe 

Pro 

Gly Gly 

15 




20 

25 




Gin 

He 

Val 

Gly Gly 

Val Tyr Leu Leu Pro 

Arg 

Arg 

Gly 

Pro 


30 




35 


40 



Arg 

Leu 

Gly 

Val 

Arg 

Ala Thr Arg Lys Thr 

Ser 

Glu 

Arg 

Ser 


45 



50 



55 


Gin 

Pro 

Arg 

Gly 

Arg 

Arg Gin Pro lie Pro 

Lys 

Ala 

Arg 

Arg 



60 


65 




70 

Ser 

Glu 

Gly 

Arg 

Ser 

Trp Ala Gin Pro Gly 

Tyr 

Pro 

Trp 

Pro 




75 

80 





Leu 

Tyr 

Gly 

Asn 

Glu 

Gly Cys Gly Trp Ala 

Gly 

Trp 

Leu 

Leu 

85 



90 

95 




Ser 

Pro 

Arg 

Gly 

Ser 

Arg Pro Ser Trp Gly 

Pro 

Asn 

Asp 

Pro 


100 



105 


110 

Thr 


Arg 

Arg 

Arg 

Ser 

Arg 

Asn Leu Gly Lys Val 

lie 

Asp 

Leu 

115 



120 



125 

Val 

Thr 

Cys 

Gly 

Phe 

Ala 

Asp Leu Met Gly Tyr 

lie 

Pro 

Leu 


130 


135 




140 

Gly 

Ala 

Pro 

Val 

Gly 

Gly Val Ala Arg Ala 

Leu 

Ala 

His 

Gly 




145 

150 





Val 

Arg 

Ala 

Val 

Glu 

Asp Gly lie Asn Tyr 

Ala 

Thr 

Gly 

Asn 

155 




160 

165 




Leu 

Pro 

Gly 

Cys 

Ser 

Phe Ser lie Phe Leu 

Leu 

Ala 

Leu 

Leu 


170 


175 


180 



Ser 

Cys 

Leu 

Thr 

Val 

Pro Thr Ser Ala 






185 190 


35 


(2) INFORMATION FOR SEQ ID NO: 196: 

(i) SEQUENCE CHARACTERISTICS: 
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15 
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25 


(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: Z7 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 196: 


Met 

Ser 

Thr 

Asn 

Pro 

Lys 

Pro 

Gin 

Arg 

Lys 

Thr 

Lys 

Arg 

Asn 

1 




5 





10 





Thr 

Asn 

Arg 

Arg 

Pro 

Met 

Asp 

Val 

Lys 

Phe 

Pro 

Gly 

Gly 

Gly 

15 





20 





25 




Gin 

He 

Val 

Gly 

Gly 

Val 

Tyr 

Leu 

Leu 

Pro 

Arg 

Arg 

Gly 

Pro 


30 





35 





40 



Arg 

Leu 

Gly 

Val 

Arg 

Thr 

Thr 

Arg 

Lys 

Thr 

Ser 

Glu 

Arg 

Ser 



45 





50 





55 


Gin 

Pro 

Arg 

Gly 

Arg 

Arg 

Gin 

Pro 

lie 

Pro 

Lys 

Ala 

Arg 

Arg 




60 





65 





70 

Ser 

Glu 

Gly 

Arg 

Ser 

Trp 

Ala 

Gin 

Pro 

Gly 

Tyr 

Pro 

Trp 

Pro 





75 





80 





Leu 

Tyr 

Gly 

Asn 

Glu 

Gly 

Cys 

Gly 

Trp 

Ala 

Gly 

Trp 

Leu 

Leu 

85 





90 





95 




Ser 

Pro 

Arg 

Gly 

Ser 

Arg 

Pro 

Ser 

Trp 

Gly 

Pro 

Asn 

Asp 

Pro 


100 





105 





110 



Arg 

Arg 

Arg 

Ser 

Arg 

Asn 

Leu 

Gly 

Lys 

Val 

lie 

Asp 

Thr 

Leu 



115 





120 





125 


Thr 

Cys 

Gly 

Phe 

Ala 

Asp 

Leu 

Met 

Gly 

Tyr 

lie 

Pro 

Leu 

Val 




130 





135 





140 

Gly 

Ala 

Pro 

Val 

Gly 

Gly 

Val 

Ala 

Arg 

Ala 

Leu 

Ala 

His 

Gly 





145 





150 





Val 

Arg 

Ala 

Leu 

Glu 

Asp 

Gly 

lie 

Asn 

Tyr 

Ala 

Thr 

Gly 

Asn 

155 





160 





165 




Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

lie 

Phe 

Leu 

Leu 

Ala 

Leu 

Leu 


170 



175 





180 



Ser 

Cys 

Leu 

Thr 

Val 

Pro 

Ala 

Ser 

Ala 
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(2) INFORMATION FOR SEQ ID NO : 197: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: DK13 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 197: 


Met 

Ser 

Thr 

Asn 

Pro 

Lys 

Pro 

Gin 

Arg 

Lys 

Thr 

Lys 

Arg 

Asn 

1 




5 





10 





Thr 

Asn 

Arg 

Arg 

Pro 

Met 

Asp 

Val 

Lys 

Phe 

Pro 

Gly 

Gly 

Gly 

15 





20 





25 




Gin 

He 

Val 

Gly 

Gly 

Val 

Tyr 

Leu 

Leu 

Pro 

Arg 

Arg 

Gly 

Pro 


30 





35 





40 



Arg 

Leu 

Gly 

Val 

Arg 

Ala 

Thr 

Arg 

Lys 

Thr 

Ser 

Glu 

Arg 

Ser 


45 





50 





55 


Gin 

Pro 

Arg 

Gly 

Arg 

Arg 

Gin 

Pro 

lie 

Pro 

Lys 

Ala 

Arg 

Gin 



60 





65 





70 

Leu 

Glu 

Gly 

Arg 

Ser 

Trp 

Ala 

Gin 

Pro 

Gly 

Tyr 

Pro 

Trp 

Pro 





75 





80 





Leu 

Tyr 

Gly 

Asn 

Glu 

Gly 

Cys 

Gly 

Trp 

Ala 

Gly 

Trp 

Leu 

Leu 

85 





90 





95 




Ser 

Pro 

Arg 

Gly 

Ser 

Arg 

Pro 

Ser 

Trp 

Gly 

Pro 

Asn 

Asp 

Pro 


100 





105 





110 



Arg 

Arg 

Arg 

Ser 

Arg 

Asn 

Leu 

Gly 

Lys 

Val 

lie 

Asp 

Thr 

Leu 


115 





120 





125 


Thr 

Cys 

Gly 

Phe 

Ala 

Asp 

Leu 

Met 

Gly 

Tyr 

lie 

Pro 

Val 

Val 


130 





135 





140 

Gly 

Ala 

Pro 

Val 

Gly 

Gly 

Val 

Ala 

Arg 

Ala 

Leu 

Ala 

His 

Gly 




145 





150 





Val 

Arg 

Leu 

Leu 

Glu 

Asp 

Gly 

Val 

Asn 

Tyr 

Ala 

Thr 

Gly Asn 

155 




160 





165 




Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

lie 

Phe 

Leu 

Leu 

Ala 

Leu 

Leu 


170 



175 





180 



Ser 

Cys 

Leu 

Thr 

Val 

Pro 

Ala 

Ser 

Ala 







185 





190 
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(2) INFORMATION FOR SEQ ID NO: 198: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA4 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 198: 

Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn 
15 10 

Thr Asn Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly 
15 20 25 


35 
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° Gin lie Val Gly Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro 
30 35 40 

Arg Leu Gly Val Arg Ala Thr Arg Lys Thr Ser Glu Arg Ser 
45 50 55 

Gin Pro Arg Gly Arg Arg Gin Pro lie Pro Lys Ala Arg Gin 

60 65 70 

Pro Thr Gly Arg Ser Trp Gly Gin Pro Gly Tyr Pro Trp Pro 

s 75 80 

3 Leu Tyr Ala Asn Glu Gly Leu Gly Trp Ala Gly Trp Leu Leu 
85 90 95 

Ser Pro Arg Gly Ser Arg Pro Asn Trp Gly Pro Asn Asp Pro 
100 105 110 

Arg Arg Lys Ser Arg Asn Leu Gly Lys Val lie Asp Thr Leu 
115 120 125 

Thr Cys Gly Phe Ala Asp Leu Met Gly Tyr lie Pro Leu Val 

10 130 135 140 

Gly Gly Pro Val Gly Gly Val Ala Arg Ala Leu Ala His Gly 

145 150 

Val Arg Val Leu Glu Asp Gly Val Asn Tyr Ala Thr Gly Asn 
155 160 165 

Leu Pro Gly Cys Ser Phe Ser lie Phe lie Leu Ala Leu Leu 
170 175 180 

Ser Cys Leu Thr Val Pro Ala Ser Ala 
15 185 190 

(2) INFORMATION FOR SEQ ID NO: 199: 

SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA5 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 199: 

25 

Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn 
15 10 

Thr Asn Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly 
15 20 25 

Gin lie Val Gly Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro 
30 35 40 

on Arg Leu Gly Val Arg Ala Thr Arg Lys Thr Ser Glu Arg Ser 
45 50 55 

Gin Pro Arg Gly Arg Arg Gin Pro lie Pro Lys Ala Arg Gin 
60 65 70 

Pro Thr Gly Arg Ser Trp Gly Gin Pro Gly Tyr Pro Trp Pro 
75 80 

Leu Tyr Ala Asn Glu Gly Leu Gly Trp Ala Gly Trp Leu Leu 
85 90 95 

35 
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20 
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30 


35 


Ser 

Pro 

Arg 

Gly 

Ser 

Arg 

Pro 

Asn 

Trp 

Gly 

Pro 

Asn 

Asp 

Pro 


100 





105 





110 



Arg 

Arg 

Lys 

Ser 

Arg 

Asn 

Leu 

Gly 

Lys 

Val 

He 

Asp 

Thr 

Leu 


115 





120 





125 


Thr 

Cys 

Gly 

Phe 

Ala 

Asp 

Leu 

Met 

Gly 

Tyr 

lie 

Pro 

Leu 

Val 


130 





135 





14 0 

Gly 

Gly 

Pro 

Val 

Gly 

Gly 

Val 

Ala 

Arg 

Ala 

Leu 

Ala 

His 

Gly 



145 





150 





Val 

Arg 

Val 

Leu 

Glu 

Asp 

Gly 

Val 

Asn 

Tyr 

Ala 

Thr 

Gly 

Asn 

155 




160 





165 




Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

lie 

Phe 

lie 

Leu 

Ala 

Leu 

Leu 


170 




175 





180 



Ser 

Cys 

Leu 

Thr 

Val 

Pro 

Ala 

Ser 

Ala 







185 





190 








(2) INFORMATION FOR SEQ ID NO : 200: 



(i) 

(vi) 

(xi) 


SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

( D ) TOPOLOGY : unknown 

ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA7 

SEQUENCE DESCRIPTION: SEQ ID NO: 200: 

Met 

1 

Ser 

Thr 

Asn 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

Asn 

Arg 

Arg 

Pro 

Gin 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

lie 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Gin 

70 

Pro 

Thr 

Gly 

Arg 

Ser 

75 

Trp 

Gly 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Leu 

85 

Tyr 

Ala 

Asn 

Glu 

Gly 

90 

Leu 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Asn 

Trp 

Gly 

Pro 

Asn 

110 

Asp 

Pro 

Arg 

Arg 

Lys 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Leu 

Val 

140 

Gly 

Gly 

Pro 

Val 

Gly 

145 

Gly 

Val 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 
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Val 

Arg 

Val 

Leu 

Glu 

Asp 

Gly 

Val 

Asn 

Tyr 

Ala 

Thr 

Gly Asn 

155 





160 





165 



Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

He 

Phe 

lie 

Leu 

Ala 

Leu Leu 


170 





175 





180 


Ser 

Cys 

Leu 

Thr 

Val 

Pro 

Ala 

Ser 

Ala 






185 





190 







(2) 

INFORMATION FOR SEQ ID NO: 201: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 201: 

Met 

1 

Ser 

Thr 

Asn 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

Asn 

Leu 

Arg 

Pro 

Gin 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

lie 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Gin 

70 

Pro 

Thr 

Gly 

Arg 

Ser 

75 

Trp 

Gly 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Leu 

85 

Tyr 

Ala 

Asn 

Glu 

Gly 

90 

Leu 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Asn 

Trp 

Gly 

Pro 

Asn 

110 

Asp 

Pro 

Arg 

Arg 

Lys 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Leu 

Val 

140 

Gly 

Gly 

Pro 

Val 

Gly 

145 

Gly 

Val 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 

Val 

155 

Arg 

Val 

Leu 

Glu 

Asp 

160 

Gly 

Val 

Asn 

Tyr 

Ala 

165 

Thr 

Gly 

Asn 

Leu 

Ser 

Pro 

170 

Cys 

Gly 

Leu 

185 

Cys 

lie 

Ser 

lie 

Phe 

Pro 

Ser 

175 

Ala 

lie 

Ser 

190 

Phe 

Ala 

lie 

Leu 

Ala 

180 

Leu 

Leu 


35 
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(2) INFORMATION FOR SEQ ID NO: 202: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 202: 


Met 

Ser 

Thr 

Asn 

Pro 

Lys 

Pro 

Gin 

Arg 

Lys 

Thr 

Lys 

Arg 

Asn 

1 




5 





10 





Thr 

Asn 

Arg 

Arg 

Pro 

Gin 

Asp 

Val 

Lys 

Phe 

Pro 

Gly 

Gly 

Gly 

15 





20 





25 




Gin 

He 

Val 

Gly 

Gly 

Val 

Tyr 

Leu 

Leu 

Pro 

Arg 

Arg 

Gly 

Pro 


30 





35 





40 



Arg 

Leu 

Gly 

Val 

Arg 

Ala 

Thr 

Arg 

Lys 

Thr 

Ser 

Glu 

Arg 

Ser 


45 





50 





55 


Gin 

Pro 

Arg 

Gly 

Arg 

Arg 

Gin 

Pro 

lie 

Pro 

Lys 

Ala 

Arg 

Gin 




60 





65 





70 

Pro 

Thr 

Gly 

Arg 

Ser 

Trp 

Gly 

Gin 

Pro 

Gly 

Tyr 

Pro 

Trp 

Pro 





75 





80 





Leu 

Tyr 

Ala 

Asn 

Glu 

Gly 

Leu 

Glu 

Trp 

Ala 

Gly 

Trp 

Leu 

Leu 

85 





90 





95 




Ser 

Pro 

Arg 

Gly 

Ser 

Arg 

Pro 

Ser 

Trp 

Gly 

Pro 

Asn 

Asp 

Pro 


100 





105 





110 



Arg 

Arg 

Lys 

Ser 

Arg 

Asn 

Leu 

Gly 

Lys 

Val 

lie 

Asp 

Thr 

Leu 


115 





120 





125 


Thr 

Cys 

Gly 

Phe 

Ala 

Asp 

Leu 

Met 

Gly 

Tyr 

lie 

Pro 

Leu 

Val 



130 





135 





140 

Gly 

Gly 

Pro 

Val 

Gly 

Gly 

Val 

Ala 

Arg 

Ala 

Leu 

Ala 

His 

Gly 





145 





150 





Val 

Arg 

Val 

Leu 

Glu 

Asp 

Gly 

Val 

Asn 

Tyr 

Ala 

Thr 

Gly 

Asn 

155 





160 





165 




Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

lie 

Phe 

lie 

Leu 

Ala 

Leu 

Leu 


170 



175 





180 



Ser 

Cys 

Leu 

Thr 

Val 

Pro 

Ala 

Ser 

Ala 








185 





190 








(2) INFORMATION FOR SEQ ID NO: 203: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

( C ) S TRANDEDNE S S : unknown 

(D) TOPOLOGY: unknown 
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20 



(vi) 

(xi) 


ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA13 

SEQUENCE DESCRIPTION: SEQ ID NO: 203: 

Met 

1 

Ser 

Thr 

Asn 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

Asn 

Arg 

Arg 

Pro 

Gin 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

lie 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Gin 

70 

Pro 

Thr 

Gly 

Arg 

Ser 

75 

Trp 

Gly 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Leu 

85 

Tyr 

Ala 

Asn 

Glu 

Gly 

90 

Leu 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Asn 

Trp 

Gly 

Pro 

Asn 

110 

Asp 

Pro 

Arg 

Arg 

Lys 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Leu 

Val 

140 

Gly 

Gly 

Pro 

Val 

Gly 

145 

Gly 

Val 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 

Val 

155 

Arg 

Val 

Leu 

Glu 

Asp 

160 

Gly 

Val 

Asn 

Tyr 

Ala 

165 

Thr 

Gly 

Asn 

Leu 

Ser 

Pro 

170 

Cys 

Gly 

Leu 

185 

Cys 

Thr 

Ser 

Val 

Phe 

Pro 

Ser 

175 

Thr 

He 

Ser 

190 

Phe 

Ala 

lie 

Leu 

Ala 

180 

Leu 

Leu 


25 


30 


35 


(2) INFORMATION FOR SEQ ID NO : 204: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

<D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA6 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 204: 

Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Gin Arg Asn 
15 10 
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Thr 

Asn 

Arg 

Arg 

Pro 

Gin 

Asp 

Val 

Lys 

Phe 

Pro 

Gly 

Gly Gly 

15 





20 





25 




Gin 

He 

Val 

Gly 

Gly 

Val 

Tyr 

Leu 

Leu 

Pro 

Arg 

Arg 

Gly 

Pro 


30 





35 





40 



Arg 

Met 

Gly 

Val 

Arg 

Ala 

Thr 

Arg 

Lys 

Thr 

Ser 

Glu 

Arg 

Ser 


45 





50 





55 


Gin 

Pro 

Arg 

Gly 

Arg 

Arg 

Gin 

Pro 

lie 

Pro 

Lys 

Ala 

Arg 

Gin 




60 





65 





70 

Ser 

Ala 

Gly 

Arg 

Ser 

Trp 

Gly 

Gin 

Pro 

Gly 

Tyr 

Pro 

Trp 

Pro 




75 





80 





Leu 

Tyr 

Ala 

Asn 

Glu 

Gly 

Leu 

Gly 

Trp 

Ala 

Gly 

Trp 

Leu 

Leu 

85 




90 





95 




Ser 

Pro 

Arg 

Gly 

Ser 

Arg 

Pro 

Asn 

Trp 

Gly 

Pro 

Asn 

Asp 

Pro 


100 





105 





110 



Arg 

Arg 

Lys 

Ser 

Arg 

Asn 

Leu 

Gly 

Lys 

Val 

lie 

Asp 

Thr 

Leu 


115 





120 





125 


Thr 

Cys 

Gly 

Phe 

Ala 

Asp 

Leu 

Met 

Gly 

Tyr 

lie 

Pro 

Leu 

Val 


130 





135 





140 

Gly Gly 

Pro 

Val 

Gly 

Gly 

Val 

Ala 

Arg 

Ala 

Leu 

Ala 

His 

Gly 





145 





150 





Val 

Arg 

Val 

Leu 

Glu 

Asp 

Gly 

Val 

Asn 

Tyr 

Ala 

Thr 

Gly Asn 

155 




160 





165 




Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

lie 

Phe 

Val 

Leu 

Ala 

Leu 

Leu 


170 



175 





180 



Ser 

Cys 

Leu 

Thr 

Val 

Pro 

Ala 

Ser 

Ala 







185 





190 








20 


25 


30 


35 


(2) INFORMATION FOR SEQ ID NO : 205: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: SA11 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 205: 


Met 

1 

Ser 

Thr 

Asn 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

Asn 

Arg 

Arg 

Pro 

Gin 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

lie 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly Arg 
60 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Gin 

70 
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Pro 

Thr 

Gly 

Arg 

Ser 

75 

Trp 

Gly 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Phe 

85 

Tyr 

Ala 

Asn 

Glu 

Gly 

90 

Leu 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

Asn 

Trp 

Gly 

Pro 

Asn 

110 

Asp 

Pro 

Arg 

Arg 

Arg 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Lys 

Val 

He 

Asp 

Thr 

125 

Leu 

Thr 

Cys 

Gly 

Phe 

130 

Ala 

Asp 

Leu 

Met 

Gly 

135 

Tyr 

lie 

Pro 

Leu 

Val 

140 

Gly 

Gly 

Pro 

Val 

Gly 

145 

Gly 

Val 

Ala 

Arg 

Ala 

150 

Leu 

Ala 

His 

Gly 

Val 

155 

Arg 

Ala 

Leu 

Glu 

Asp 

160 

Gly 

Val 

Asn 

Tyr 

Ala 

165 

Thr 

Gly Asn 

Leu 

Ser 

(2) 

Pro Gly Cys Ser Phe Ser lie Phe lie Leu Ala Leu Leu 
170 175 180 

Cys Leu Thr Val Pro Ala Thr Ala 
185 190 

INFORMATION FOR SEQ ID NO: 206: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 191 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

<vi) ORIGINAL SOURCE: 

(A) ORGANISM: homosapiens 

(C) INDIVIDUAL ISOLATE: HK2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 206: 

Met 

1 

Ser 

Thr 

Leu 

Pro 

5 

Lys 

Pro 

Gin 

Arg 

Lys 

10 

Thr 

Lys 

Arg 

Asn 

Thr 

15 

Asn 

Arg 

Arg 

Pro 

Thr 

20 

Asp 

Val 

Lys 

Phe 

Pro 

25 

Gly 

Gly 

Gly 

Gin 

lie 

30 

Val 

Gly 

Gly 

Val 

Tyr 

35 

Leu 

Leu 

Pro 

Arg 

Arg 

40 

Gly 

Pro 

Arg 

Leu 

Gly 

45 

Val 

Arg 

Ala 

Thr 

Arg 

50 

Lys 

Thr 

Ser 

Glu 

Arg 

55 

Ser 

Gin 

Pro 

Arg 

Gly 

60 

Arg 

Arg 

Gin 

Pro 

lie 

65 

Pro 

Lys 

Ala 

Arg 

Gin 

70 

Pro 

Gin 

Gly 

Arg 

His 

75 

Trp 

Ala 

Gin 

Pro 

Gly 

80 

Tyr 

Pro 

Trp 

Pro 

Leu 

85 

Tyr 

Gly 

Asn 

Glu 

Gly 

90 

Cys 

Gly 

Trp 

Ala 

Gly 

95 

Trp 

Leu 

Leu 

Ser 

Pro 

100 

Arg 

Gly 

Ser 

Arg 

Pro 

105 

His 

Trp 

Gly 

Pro 

Asn 

110 

Asp 

Pro 

Arg 

372577. 

Arg 

Arg 

115 

Ser 

Arg 

Asn 

Leu 

Gly 

120 

Lys 

Val 

lie 

Asp 

Thr 

125 

Leu 



227 


5 


10 


15 


Thr 

Cys 

Gly 

Phe 

Ala 

Asp 

Leu 

Met 

Gly 

Tyr 

lie 

Pro 

Val 

Val 


130 





135 





140 

Gly Ala 

Pro 

Leu 

Gly 

Gly 

Val 

Ala 

Ala 

Ala 

Leu 

Ala 

His 

Gly 





145 





150 





Val 

Arg 

Ala 

lie 

Glu 

Asp 

Gly 

lie 

Asn 

Tyr 

Ala 

Thr 

Gly Asn 

155 





160 





165 




Leu 

Pro 

Gly 

Cys 

Ser 

Phe 

Ser 

He 

Phe 

Leu 

Leu 

Ala 

Leu 

Leu 


170 




175 





180 



Ser 

Cys 

Leu 

Thr 

Thr 

Pro 

Ala 

Ser 

Ala 
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(2) INFORMATION FOR SEQ ID NO: 207: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 207: 

GCGTCCGGGT TCTGGAAGAC GGCGTGAACT ATGCAACAGG 40 


(2) INFORMATION FOR SEQ ID NO: 208: 

{ i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:208: 

AGGCTTTCAT TGCAGTTCAA GGCCGTGCTA TTGATGTGCC 40 


25 (2) INFORMATION FOR SEQ ID NO: 2 09: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

( x i) SEQUENCE DESCRIPTION: SEQ ID NO: 209: 

AAGACGGCGT GAACTATGCA ACAGGGAAC C TTCCTGGTTG 4 0 


(2) INFORMATION FOR SEQ ID NO: 2 10: 


35 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 210: 

AGTTCAAGGC CGTGCTATTG ATGTGCCAAC TGCCGTTGGT 40 


10 


(2) INFORMATION FOR SEQ ID NO: 211: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 211: 


AAGACGGCGT GAATTCTGCA ACAGGGAACC TTCCTGGTTG 

15 


(2) INFORMATION FOR SEQ ID NO: 212: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 212: 

AGTTCAAGGC CGTGGAATTC ATGTGCCAAC TGCCGTTGGT 


40 


40 


25 (2) INFORMATION FOR SEQ ID NO: 213: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:213: 

ARCTYCGACG TYACATCGAY CTGCTYGTYG GRAGYGCCAC CC 42 


(2) INFORMATION FOR SEQ ID NO: 214: 


35 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 214: 

RCARGCCRTC TTGGAYATGA TCGCTGGWGC Y 31 


10 


15 


(2) INFORMATION FOR SEQ ID NO: 215: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 15: 

CRATACGACR YCAYGTCGAY TTGCTCGTTG GGGCGGCTRY YT 


42 


(2) INFORMATION FOR SEQ ID NO: 216: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 216: 

RCAAGCTRTC RTGGAYRTGG TRRCRGGRGC C 31 


25 (2) INFORMATION FOR SEQ ID NO: 217: 

SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

SEQUENCE DESCRIPTION: SEQ ID NO: 217: 

TTGCGGACKC ACATYGACAT GGT Y GTGATG TCCGCCACGC 40 


(2) INFORMATION FOR SEQ ID NO: 218: 


(i) 


30 


35 


372577_ 



230 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 218: 

GATGCGCGTT CCCGAGGTCA TCWTAGACAT CRTYRGCGGR GCD 43 


10 


15 


(2) INFORMATION FOR SEQ ID NO: 2 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 219: 

AATGGCACCY TGCRCTGCTG GATACAAGTR ACACCTAATG TGGCTGTGAA 
ACAC 


50 

54 


(2) INFORMATION FOR SEQ ID NO: 220: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 220: 

TGARCTAGYC CT Y SARGTYG TCTTCGGYGG Y 31 


25 

(2) INFORMATION FOR SEQ ID NO: 221: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

3Q (D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 221: 

GCCAACGTCT CTCGATGTTG GGTGCCGGTT GCCCCCAATC TCGCCATAAG 50 
TCAA 54 


35 
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(2) INFORMATION FOR SEQ ID NO: 222: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 222: 

AAGGGCCTGC GAGCACACAT CGATATCATC GTGATGTCTG CTACGG 46 


10 


(2) INFORMATION FOR SEQ ID NO: 223: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 223: 

TTGGTGCGCA T C C CGGAAGT CATCTTGGAT ATTGTTACAG GAGGT 


45 


(2) INFORMATION FOR SEQ ID NO: 224: 

SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

SEQUENCE DESCRIPTION: SEQ ID NO: 224: 

AGTCAGGTAY GTCGGAGCAA CCACCGCYTC GATACGCAGT 40 

25 

(2) INFORMATION FOR SEQ ID NO: 225: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

- n (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 225: 

AGCCTTCACG TTCAGACCKC GTCGCCATCA AACRGTCCAG ACCTGT 46 


(i) 

20 


(xi) 


35 
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(2) INFORMATION FOR SEQ ID NO: 22 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 75 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 226: 

TCCCCCGCYG TGGGTATGGT GGTRGCGCAC RTYCTGCGDY TGCCCCAGAC 50 
CKTGTTYGAC ATAMTRGCYG GGGCC 75 


10 (2) INFORMATION FOR SEQ ID NO: 227: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 227: 

ACGCCGGTGA CGCCTACAGT GGCTGTCGCA CACCCGGGC 39 


(2) INFORMATION FOR SEQ ID NO: 22 8: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 228: 

25 ATGAGGGT C C CCACAGCCTT TCTCGACATG GTTGCCGGAG GC 4 2 


(2) INFORMATION FOR SEQ ID NO: 229: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 229: 

CGCGCCCTAT CCCAACGCAC CGTTAGAGTC CATGCGCAGG 40 


35 
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(2) INFORMATION FOR SEQ ID NO: 230: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 230: 

T CAGATCTTA CGGATCCCCT CTATCCTAGG TGACTTGCT C ACCGGGGGT 4 9 


10 (2) INFORMATION FOR SEQ ID NO: 231: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 231: 

CAGTCACGCT GCTGGGTGGC CCTTACTCCC ACCGTGGCGG Y GY CTT AT AT 5 0 

CGGT 54 


(2) INFORMATION FOR SEQ ID NO: 232: 

SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 232: 

25 

TAGCACTCTG GTRGAYCTAC TCRCTGGAGG G 31 


<i) 


(2) INFORMATION FOR SEQ ID NO: 233: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 233: 

AAGTCTACAT GCTGGGTGTC TCTCACCCCC ACCGTGGCTG CGCAACATCT 50 

35 
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54 


(2) INFORMATION FOR SEQ ID NO: 2 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 34: 

AGGCGCCATG GTCGACCTGC TTGCAGGCGG C 

10 

(2) INFORMATION FOR SEQ ID NO: 235: 


15 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 235: 

TCAGCCCCGA VYYTCGGAGC GGTCACGGCT CCTCTTCGGA GGG 


31 


43 


(2) INFORMATION FOR SEQ ID NO: 23 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 


25 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 236: 

TGYTACGGAT YCCCCARGTG GTCATHGACA TCATWGCCGG GGSC 


44 


(2) INFORMATION FOR SEQ ID NO: 237: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 7: 

CATACCAAAT GCTTCCACGC CCGCAACGGG ATTCCGCAGG 40 

35 
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(2) INFORMATION FOR SEQ ID NO: 238: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

3 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 238: 

TCTTCTTGCG GGCGCCGC AG TGGTTTGCTC ATCCCTG 37 

10 (2) INFORMATION FOR SEQ ID NO: 23 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 52 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 239: 

AT C TAG CAT C T TGAGGGT AC CTGAGATTTG TGCGAGTGTG ATATTTGGTG 5 0 

GC 52 


(2) INFORMATION FOR SEQ ID NO: 240: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 amino acids 

(B) TYPE: ammo acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 


25 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 240: 

Trp lie Gin Val Thr Pro Asn Val Ala Val Lys His Arg 
5 10 

Leu Thr His Asn Leu Arg Xaa His Xaa Asp Xaa lie Val 
20 25 

Ala Thr Val 


Gly Ala 
15 

Met Ala 
30 


(2) INFORMATION FOR SEQ ID NO: 241: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 


372577_1 



(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 241: 


Trp Val Pro 
Leu Thr Lys 
Ala Thr Val 


Val Ala Pro Asn Leu Ala lie Ser Gin Pro Gly Ala 
5 10 15 

Gly Leu Arg Ala His lie Asp lie lie Val Met Ser 
20 25 30 


(2) INFORMATION FOR SEQ ID NO: 242: 

(i) SEQUENCE CHARACTERISTICS: 

{A) LENGTH: 33 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:242: 

Trp lie Pro Val Xaa Pro Asn Val Ala Val Xaa Xaa Pro Gly Ala 
5 10 15 

Leu Thr Gin Gly Leu Arg Thr His lie Asp Met Val Val Met Ser 
20 25 30 

Ala Thr Leu 


(2) INFORMATION FOR SEQ ID NO: 243: 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 


(xi) 

Trp Thr Xaa 
Thr Thr Ala 


SEQUENCE DESCRIPTION: SEQ ID NO: 243: 

Val Thr Pro Thr Val Ala Val Arg Tyr Val 
5 10 

Ser lie Arg Ser His Val Asp Leu Leu Val 
20 25 


Ala Thr Xaa 


Gly Ala 
15 

Gly Ala 
30 


(2) INFORMATION FOR SEQ ID NO: 244: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY : unknown 



237 


5 


(xi) 

Trp Val Ala 
Xaa Thr Xaa 
Ala Xaa Phe 


SEQUENCE 

Leu Xaa Pro 
5 

Xaa lie Arg 
20 


DESCRIPTION: SEQ ID NO: 244: 

Thr Leu Ala Ala Arg Asn Xaa Xaa Xaa 
10 15 

Xaa His Val Asp Leu Leu Val Gly Ala 
25 30 


(2) INFORMATION FOR SEQ ID NO: 245: 


(i) SEQUENCE CHARACTERISTICS: 





(A) 

LENGTH: 33 amino acids 








(B) 

TYPE: amino acid 








(C) 

STRANDEDNESS : unknown 








(D) 

TOPOLOGY : unknown 






(xi) 

SEQUENCE 

DESCRIPTION: SEQ ID 

NO : 24 5 : 



Trp 

Val 

Xaa 

Xaa Xaa 

Pro 

Thr Val Ala Thr Arg 

Asp 

Gly 

Lys 

Leu 




5 


10 




15 

Pro 

Xaa 

Xaa 

Gin Leu 

Arg 

Arg Xaa lie Asp Leu 

Leu 

Val 

Gly 

Ser 




20 


25 




30 

Ala 

Thr 

Leu 









(2) INFORMATION FOR SEQ ID NO: 24 6: 

(i) SEQUENCE CHARACTERISTICS: 





(A) 

LENGTH: 33 amino acids 







(B) 

TYPE: amino acid 







(C) 

STRANDEDNESS : unknown 







(D) 

TOPOLOGY : unknown 





(xi) 

SEQUENCE DESCRIPTION: SEQ ID 

NO: 

246 : 


Trp 

Thr 

Pro 

Val Thr 

Pro Thr Val Ala Val Ala 

His 

Pro 

Gly Ala 




5 

10 



15 

Pro 

Leu 

Glu 

Ser Phe 

Arg Arg His Val Asp Leu 

Met 

Val 

Gly Ala 




20 

25 



30 

Ala 

Thr 

Leu 







30 


35 


(2) INFORMATION FOR SEQ ID NO: 24 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 247: 
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Trp Val Ala Leu Thr Pro Thr Val Ala Xaa 
5 10 

Pro Leu Xaa Ser Xaa Arg Arg His Val Asp 
20 25 


Ala Thr Val 


Xaa 

Leu 


Tyr lie Gly Ala 
15 

Met Val Gly Ala 
30 


5 


10 


15 


(2) INFORMATION FOR SEQ ID NO: 248: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID 

Trp Val Ser Leu Thr Pro Thr Val Ala Ala Gin 
5 10 

Pro Leu Glu Ser Leu Arg Arg His Val Asp Leu 
20 25 

Ala Thr Leu 


NO : 248 : 

His Leu Asn Ala 
15 

Met Val Gly Gly 
30 


20 


25 


(2) INFORMATION FOR SEQ ID NO: 249: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 9: 


Trp Val Pro 
Pro Leu Glu 
Ala Thr Met 


Leu Thr Pro Thr Val Ala Ala Pro 
5 10 

Ser Met Arg Arg His Val Asp Leu 
20 25 


Tyr Pro Asn Ala 
15 

Met Val Gly Ala 
30 


(2) INFORMATION FOR SEQ ID NO: 250: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 250: 


35 
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Trp Val Xaa lie Thr Pro Thr Leu Ser 
5 

Val Thr Ala Pro Leu Arg Arg Xaa Val 
20 


Ala Ala Leu 


Ala Pro Xaa Xaa Gly Ala 
10 15 

Asp Tyr Leu Ala Gly Gly 
25 30 


(2) INFORMATION FOR SEQ ID NO: 251: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 


10 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 251: 

Trp His Ala Val Thr Pro Thr Leu Ala lie Pro Asn Ala Ser Thr 
5 10 15 

Pro Ala Thr Gly Phe Arg Arg His Val Asp Leu Leu Ala Gly Ala 
20 25 30 

Ala Val Val 


15 


20 


25 


30 


35 


(2) INFORMATION FOR SEQ ID NO: 2 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 252: 

Thr Leu Thr Met lie Leu Ala Tyr Ala Ala Arg Val Pro Glu Leu 

5 10 15 

Xaa Leu Xaa Val Val Phe Gly Gly 
20 


(2) INFORMATION FOR SEQ ID NO: 253: 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:253: 


Thr Thr Thr Met Leu Leu Ala Tyr Leu Val Arg lie Pro Glu Val 
5 10 15 

lie Leu Asp lie Val Thr Gly Gly 
20 


372577_ 



240 


(2) INFORMATION FOR SEQ ID NO: 2 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

«. (C) STRANDEDNESS: unknown 

D (D) TOPOLOGY: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 54: 

Thr Xaa Thr Xaa lie Leu Ala Tyr Xaa Met Arg Val Pro Glu Val 
5 10 15 

lie Xaa Asp lie Xaa Xaa Gly Ala 

10 20 

(2) INFORMATION FOR SEQ ID NO: 255: 


(i) 


15 


SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 255: 

Ala Val Gly Met Val Val Ala His Xaa Leu Arg Leu Pro Gin Thr 
5 10 15 

Xaa Phe Asp lie Xaa Ala Gly Ala 

20 20 


25 


30 


(2) INFORMATION FOR SEQ ID NO: 256: 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TO POLOGY : unknown 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO:256: 


Thr Xaa Ala Leu Val Xaa Ser Gin Leu Leu Arg Xaa Pro Gin Ala 
5 10 15 

Xaa Xaa Asp Xaa Val Xaa Gly Ala 
20 


35 


(2) INFORMATION FOR SEQ ID NO: 2 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 
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(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 257: 

Thr Xaa Ala Leu Val Xaa Ala Gin Leu Leu Arg Xaa Pro Gin Ala 
5 10 15 

Xaa Leu Asp Met lie Ala Gly Ala 
20 


(2) INFORMATION FOR SEQ ID NO: 258: 


10 


(i) 


SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 


(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 58: 

Thr Thr Thr Leu Leu Leu Ala Gin lie Met Arg Val Pro Thr Ala 
15 5 10 15 

Phe Leu Asp Met Val Ala Gly Gly 
20 


20 


25 


(2) INFORMATION FOR SEQ ID NO: 259: 


(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 259: 


Thr Thr Thr Leu Xaa Leu Ala Gin Val Met Arg lie Pro Ser Thr 
5 10 15 

Leu Val Asp Leu Leu Xaa Gly Gly 
20 


(2) INFORMATION FOR SEQ ID NO: 260: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:260: 
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Thr Ala Thr Leu Val Leu Ala Gin Leu Met Arg lie Pro Gly Ala 
5 10 15 

Met Val Asp Leu Leu Ala Gly Gly 
20 


5 (2) INFORMATION FOR SEQ ID NO: 261: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 261: 

Thr Ser Ala Leu lie Met Ala Gin lie Leu Arg lie Pro Ser lie 
5 10 15 

Leu Gly Asp Leu Leu Thr Gly Gly 
20 


15 (2) INFORMATION FOR SEQ ID NO: 262: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 262: 

Xaa Thr Ala Leu Xaa Met Ala Gin Xaa Leu Arg lie Pro Gin Val 
5 10 15 

Val He Asp lie lie Ala Gly Xaa 
20 

25 (2) INFORMATION FOR SEQ ID NO:263: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

30 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:263: 

Thr Thr Thr Leu Val Leu Ser Ser lie Leu Arg Val Pro Glu lie 
5 10 15 

Cys Ala Ser Val lie Phe Gly Gly 
20 


35 
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CLAIMS 


1. A purified and isolated DNA having a 
sequence selected from the group consisting of SEQ ID NO : 1 
through SEQ ID NO: 51. 

2. A purified and isolated protein encoded by a 
gene whose sequence includes a sequence selected from the 
group consisting of SEQ ID NO: 52 through SEQ ID NO: 102. 

3. A purified and isolated DNA having a 
sequence selected from the group consisting of SEQ ID NO: 
103 through SEQ ID NO: 154. 

4. A purified and isolated protein encoded by a 
gene sequence selected from the group consisting of SEQ ID 
NO: 155 through SEQ ID NO: 206. 

5. A purified and isolated protein having an 
amino acid sequence selected from the group consisting of 
SEQ ID NO: 52 through SEQ ID NO: 102 and SEQ ID NO: 155 
through SEQ ID NO: 2 06. 

6. A method for the recombinant DNA-directed 
synthesis of a protein, said method comprising: 

culturing a transformed or transfected host 
organism containing a DNA sequence capable 
of directing the host organism to produce 
said protein under conditions such that the 
protein is produced, said protein exhibiting 
substantial homology to a protein comprising 
the amino acid sequence selected from the 
group consisting of SEQ ID NO: 52 through SEQ 
ID NO: 102 or SEQ ID NO: 155 through SEQ ID 
NO: 206 . 
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° 7. The method of claim 6 , wherein the host 

organism is transfected with a recombinant eukaryotic 
expression vector. 

8 . The method of claim 7, wherein the host 
5 organism is a eukaryotic cell. 

9. A recombinant expression vector comprising a 
DNA sequence selected from the group consisting of SEQ ID 
NO : 1 through SEQ ID NO: 51 and SEQ ID NO: 103 through SEQ ID 

10 NO : 154 . 


10 . A host organism transformed or transfected 
with a recombinant expression vector according to claim 9. 

15 11. A method of detecting antibodies against 

HCV, said method comprising: 

(a) contacting a biological sample with at 
least one protein of claim 5 to form an 
immune complex with the antibodies; and 
20 (b) detecting the presence of the immune 

complex . 


12 . The method of claim 11 wherein the 
biological sample is selected from the group consisting of 
25 serum, saliva or lymphocytes or other mononuclear cells. 


13. The method of claim 11, wherein the 
recombinant protein is bound to a solid support. 


14. The method of claim 11, wherein the immune 
complex is detected using a labeled antibody. 

15. A hepatitis C virus kit comprising: at least 
one protein comprising an amino acid sequence selected from 
the group consisting of: SEQ ID NO: 52 through SEQ ID NO: 102 
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and SEQ ID NO: 155 through SEQ ID NO: 206. 

16. A composition comprising at least one 
recombinant protein of claim 5 and an excipient, diluent or 
carrier. 


5 


10 


15 


20 


25 


17. A composition comprising an expression 
vector capable of directing host organism synthesis of a 
protein having an amino acid sequence selected from the 
group consisting of SEQ ID NO: 52 through SEQ ID NO : 102 
and SEQ ID NO: 155 through SEQ ID NO: 206. 

18. A method of preventing hepatitis C 
infection, comprising administering the composition of 
claim 16 or 17 to a mammal in an effective amount to 
stimulate the production of protective antibody. 

19. A vaccine for immunizing a mammal against 
hepatitis C infection, comprising at least one protein 
according to claim 5 in a pharmacologically acceptable 
carrier . 


20. A vaccine for immunizing a mammal against 
hepatitis C infection, said vaccine comprising an 
expression vector capable of directing host organism 
synthesis of a protein having an amino acid sequence 
selected from the group consisting of SEQ ID NO: 52 - SEQ ID 
NO: 102 and SEQ ID NO: 155 - SEQ ID NO: 206. 


21. A method for detecting the presence of the 
hepatitis C virus via a reverse transcription-polymerase 

3 Q chain reaction, said method comprising amplifying an HCV 

reverse transcription product by polymerase chain reaction 
using universal primers. 

22. The method of claim 21, wherein said 

32 universal primers are deduced from universally conserved 
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nucleotide domains found in SEQ ID NO: 1 through SEQ ID NO: 
51, in SEQ ID NO: 103 through SEQ ID NO: 154, or in 
consensus sequences shown in Figures 1A-H and 6A-K. 

23 . Substantially isolated and purified 
universal primers, wherein said primers have nucleic acid 
sequences derived from universally conserved nucleotide 
domains found in SEQ ID NO : 1 through SEQ ID NO: 51, in SEQ 
ID NO: 103 through SEQ ID NO: 154 and in consensus sequences 
showing Figures 1A-H and 6A-K. 

24. A diagnostic kit for use in detecting the 
presence of hepatitis C virus in a biological sample, said 
kit comprising at least two universal primers according to 
claim 22. 


25. A diagnostic kit for use in detecting the 
presence of hepatitis C virus is a biological sample, said 
kit comprising at least one nucleic acid sequence selected 
from the group consisting of SEQ ID No: 1-51 or SEQ ID 

No : 103 -154 . 

26. A method for determining the genotype of a 
hepatitis C virus, said method comprising: 

amplifying reverse transcription 
products of RNA via polymerase chain 
reaction using genotype-specific 
amplification primers deduced from 
genotype- specif ic nucleotide domains 
found in SEQ ID NO:l through SEQ ID 
NO: 51, in SEQ ID NO: 103 through SEQ ID 
NO: 154, or in consensus sequences shown 
in Figures 1A-H and 6A-K. 

27. A method for determining the genotype of a 
hepatitis C virus, said method comprising: 



(a) amplifying RNA via reverse 
transcription-polymerase chain reaction 
to produce amplification products; 

(b) contacting said products with at least 
one sequence shown in SEQ ID NO : 1 
through SEQ ID NO: 51 and SEQ ID NO: 103 
through SEQ ID NO: 154; and 

(c) detecting complexes of said product 
which bind to said nucleic acid 
sequence . 

28. A method for determining the genotype of a 
hepatitis C virus, said method comprising: 

(a) amplifying RNA via reverse 
transcription-polymerase chain reaction 
to produce amplification products; 

(b) contacting said products with at least 
one genotype-specific oligonucleotide; 
and 

(c) detecting complexes of said products 
which bind to said oligonucleotide { s) . 

29. The method of claims 27 or 28, wherein said 
amplification of step (a) uses universal primers deduced 
from universally conserved nucleotide domains found in SEQ 
ID NO : 1 through SEQ ID NO: 51, in SEQ ID NO: 103 through SEQ 
ID NO: 154, or in consensus sequences shown in Figures 1A-H 
and 6A-K . 


30. The method of claim 28, wherein said 
genotype-specific oligonucleotide of step (b) is a nucleic 
acid sequence deduced from genotype-specific nucleotide 
domains found in SEQ ID N0:1 through SEQ ID NO: 51 and SEQ 
ID NO: 103 through SEQ ID NO: 154, or in consensus sequences 
shown in Figures 1A-H and 6A-K. 



31. Substantially isolated and purified 
genotype-specific oligonucleotides, wherein said 
oligonucleotides have nucleic acid sequences deduced from 
genotype-specific nucleotide domains found in SEQ ID NO : 1 
through SEQ ID NO: 51, in SEQ ID NO: 103 through SEQ ID 
NO: 154, or in consensus sequences shown in Figures 1A-H and 
6A-K . 


32. Substantially purified and isolated 
genotype-specific peptides having amino acid sequences 
deduced from a genotype-specific amino acid domains located 
in SEQ ID NO: 52 through SEQ ID NO: 102, in SEQ ID NO: 155 
through SEQ ID NO: 206, or in consensus sequences shown in 
Figures 2A-H and 7A-K. 

33. A method of detecting antibodies specific 
for a single genotype of HCV, said method comprising: 

(a) contacting a biological sample with at 
least one peptide of claim 32 to form 
an immune complex with the antibodies, 
and 

<b) detecting the presence of the immune 
complex. 

34. The method of claim 33, wherein the 
biological sample is selected from the group consisting of 
serum, saliva or lymphocytes or other mononuclear cells. 

35. The method of claim 33, wherein said peptide 
is bound to a solid support. 

36. The method of claim 33, wherein the immune 
complex is detected using a labelled antibody or antigen. 

37. A kit for use in detecting antibodies 
specific for a single genotype of HCV, said kit comprising: 



at least one peptide selected from the genotype -specific 
peptides of claim 32. 

38. Substantially purified and isolated 
universal peptides having amino acid sequences deduced from 
universally conserved amino acid domains found in SEQ ID 
NO: 52 through SEQ ID NO: 102, in SEQ ID NO: 155 through SEQ 
ID NO: 206, or in consensus sequences shown in Figures 2A-H 
and 7A-K . 


39. A method of detecting antibodies against all 
genotypes of HCV, said method comprising: 

(a) contacting a biological sample with at 
least one peptide of claim 38 to form 
an immune complex with the antibodies, 
and 

(b) detecting the presence of the immune 
complex . 

40. The method of claim 39, wherein the 
biological sample is selected from the group consisting of 
serum, saliva or lymphocytes or other mononuclear cells. 

41. The method of claim 39, wherein said peptide 
is bound to a solid support. 

42. The method of claim 39, wherein the immune 
complex is detected using a labelled antibody or antigen. 

43. A composition comprising at least one 
peptide of claim 32 and an excipient, diluent or carrier. 

44 . A composition comprising at least one 
peptide of claim 38 and an excipient, diluent or carrier. 

45. A method of preventing hepatitis C 
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° 31. Substantially isolated and purified 

genotype-specific oligonucleotides, wherein said 
oligonucleotides have nucleic acid sequences deduced from 
genotype-specific nucleotide domains found in SEQ ID NO : 1 
through SEQ ID NO: 51, in SEQ ID NO: 103 through SEQ ID 
5 NO: 154, or in consensus sequences shown in Figures 1A-H and 
6 A-K . 


32. Substantially purified and isolated 
genotype-specific peptides having amino acid sequences 
10 deduced from a genotype-specific amino acid domains located 
in SEQ ID NO: 52 through SEQ ID NO: 102, in SEQ ID NO: 155 
through SEQ ID NO: 206, or in consensus sequences shown in 
Figures 2A-H and 7A-K. 

15 33. A method of detecting antibodies specific 

for a single genotype of HCV, said method comprising: 

(a) contacting a biological sample with at 
least one peptide of claim 32 to form 
an immune complex with the antibodies, 

20 and 

(b) detecting the presence of the immune 
complex . 

34. The method of claim 33, wherein the 
25 biological sample is selected from the group consisting of 
serum, saliva or lymphocytes or other mononuclear cells. 


30 


35. The method of claim 33, wherein said peptide 
is bound to a solid support. 

36. The method of claim 33, wherein the immune 
complex is detected using a labelled antibody or antigen. 


35 


37. A kit for use in detecting antibodies 
specific for a single genotype of HCV, said kit comprising: 


372577_ 



at least one peptide selected from the genotype-specific 
peptides of claim 32. 


38. Substantially purified and isolated 
universal peptides having amino acid sequences deduced from 
universally conserved amino acid domains found in SEQ ID 
NO: 52 through SEQ ID NO: 102, in SEQ ID NO: 155 through SEQ 
ID NO: 206, or in consensus sequences shown in Figures 2A-H 
and 7A-K. 


39. A method of detecting antibodies against all 
genotypes of HCV, said method comprising: 

(a) contacting a biological sample with at 
least one peptide of claim 38 to form 
an immune complex with the antibodies, 
and 

(b) detecting the presence of the immune 
complex . 

40. The method of claim 39, wherein the 
biological sample is selected from the group consisting of 
serum, saliva or lymphocytes or other mononuclear cells. 

41. The method of claim 39, wherein said peptide 
is bound to a solid support. 

42. The method of claim 39, wherein the immune 
complex is detected using a labelled antibody or antigen. 

43. A composition comprising at least one 
peptide of claim 32 and an excipient, diluent or carrier. 

44. A composition comprising at least one 
peptide of claim 38 and an excipient, diluent or carrier. 


45. A method of preventing hepatitis C 



infection, comprising administering the composition of 
claims 43 or 44 to a mammal in an effective amount to 
stimulate production of a protective antibody. 

46. A vaccine for immunizing a mammal against 
hepatitis C infection, comprising at least one peptide 
according to claims 32 or 38 in a pharmaceutically 
acceptable carrier. 

47. A composition comprising at least one 
expression vector capable of directing host organism 
synthesis of a genotype-specific peptide having amino acid 
sequence deduced from a genotype-specific amino acid domain 
located in SEQ ID NO: 52 - SEQ ID NO: 102, and SEQ ID NO: 155 

- SEQ ID NO: 206, or in consensus sequences shown in figures 
2A-H and 7A-K. 

48. A composition comprising at least one 
expression vector capable of directing host organism 
synthesis of a universal peptide having amino acid sequence 
deduced from universally conserved amino acid domains found 
in SEQ ID NO: 52 - SEQ ID NO: 102, and SEQ ID NO: 155 - SEQ ID 
NO: 206, or in consensus sequences shown in figures 2A-H and 
7A-K . 


49. A method of preventing hepatitis C 
infection, comprising administering the composition of 
claims 47 or 48 to a mammal in an effective amount to 
stimulate production of a protective antibody. 

50. A vaccine for immunizing a mammal against 
hepatitis C infection, said vaccine comprising at least one 
expression vector capable of directing host organism 
synthesis of a geno-type specific peptide having amino acid 
sequence deduced from a geno type-specific amino acid 
domain located in SEQ ID NO: 52 - SEQ ID NO: 102, and SEQ ID 
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NO: 155 - SEQ ID NO: 206, or in consensus sequences shown in 
figures 2A-H and 7A-K. 


5 


10 


51. A vaccine for immunizing a mammal against 
hepatitis C infection, comprising at least one expression 
vector capable of directing host organism synthesis of a 
universal peptide having amino acid sequence deduced from 
universally conserved amino acid domain found in SEQ ID 
NO: 52 - SEQ ID NO : 102 , and SEQ ID NO: 155 - SEQ ID NO: 206, 
or in consensus sequences shown in figures 2A-H and 7A-K. 

52. Anti-HCV core antibodies having specific 
binding affinity for core protein of a single genotype of 
HCV . 


53. Anti-HCV envelope 1 antibodies having 
specific binding affinity for envelope 1 protein of a 
single genotype of HCV. 


54. The antibodies of claims 52 or 53 wherein 
20 said antibodies are monoclonal antibodies. 

55. A method of detecting core protein specific 
for a single genotype of HCV, said method comprising: 

(a) contacting a biological sample with at 

25 least one antibody of claim 52 to form 

an immune complex with said core 
protein, and 

(b) detecting the presence of the immune 
complex. 

30 

56. A method of detecting El protein specific 
for a single genotype of HCV, said method comprising: 

(a) contacting a biological sample with at 
least one antibody of claim 53 to form 
an immune complex with said El protein 
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° and 

(b) detecting the presence of the immune 
complex . 

57. The methods of claims 55 or 56, wherein the 
5 biological sample is selected from the group consisting of 
serum, saliva lymphocytes or other mononuclear cells and 
liver . 


58. The method of claims 55 or 56, wherein said 
10 antibody is bound to a solid support. 

59. A method of detecting antibodies against all 
genotypes of HCV, said method comprising: 

(a) contacting a biological sample with at 
least one universal peptide of claim 38 
to form an immune complex with said 
antibodies; and 

(b) detecting the presence of the immune 
complex . 

20 


25 


30 


35 
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ABSTRACT 


The nucleotide and deduced amino acid sequences 
of cDNAs encoding the envelope 1 genes and core genes of 
isolates of hepatitis C virus (HCV) are disclosed. The 
invention relates to the oligonucleotides, peptides and 
recombinant envelope 1 and core proteins derived from these 
sequences and their use in diagnostic methods and vaccines. 



FIGURE 1A 


Isolate 

S14 1 TACCAAGTGCGCAACTCCACGGGGCTTTACCATGTtACCAATGATTGCCCTAACTCGAGTA 

1 1 ! 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 III 1 1 1 1 j II 1 1 1 Ml I ! 1 1 II I II 1 1 1 1 1 1 1 1 

DK7 1 TACCAAGTGCGCAACTCCACGGGGCTTTACCATGTCACCAATGATTGCCCTAACTCGAGTA 

1 1 1 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

US 11 1 TACCAAGTaCGCAACTCCACGGGGCTTTACCATGTCACCAATGATTGCCCTAACTCGAGTA 

Mini! iiiiiiii ii iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii mini 

DR4 1 CACCAAGTGCGCAACTCTACAGGGCTTTACCATGTCACCAATGATTGCCCTAATTCGAGTA 

I M II I II 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ; 1 1 1 1 1 1 1 1 1 M 1 1 M I 

DRl 1 CACCAAGTGCGCAACTCTACAGGGCTTTACCATGTCACCAATGATTGCCCTAATTCGAGTA 

1 1 1 1 1 1 1 iiiiiiii i ii ii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 mini 

DK9 1 TACCAAGTACGCAACTCCtCGGGCCTcTACCATGTCACCAATGATTGCCCTAACTCGAGTA 

i n 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 mini inn in mm in n minim n i 

SI 8 1 TACCAAGTACGCAACTCCaCGGGCCTTTACCATGTCACCAATGAcTGCCCTAACTCGAGcA 

! n 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 nnnnnnnnnnnm nnnnnnn i 

SWl 1 TACCAAGTACGCAACTCCtCGGGCCTTTACCATGTCACCAATGAtTGCCCTAACTCGAGtA 

consensus tACCAAGT - CGCAACTCcaCgGGgCTtTACCATGTcACCAATGAtTGCCCTAAcTCGAGtA 


Isolate 

S14 

DK7 

US11 

DR4 

DRl 

DK9 

S18 

SWl 

consensus 


6 2 TtGTGTACGAGaCaGCtGATGC tATCCTaCACgCTCCGGGaTGTGTCCCTTGCGTTCGtGA 

i nniiin i n inn inn in mini nnnnnnnm n 

6 2 TcGTGTACGAGGCGGCCGATGCCATCCTGCACACTCCGGGGTGTGTCCCTTGCGTTCGCGA 

i 1 1 n 1 1 1 1 1 1 1 1 1 1 i nnnnnnn 

6 2 TTGTGTACGAGGCGGCCGATGCCATCCTGCACACTCCGGGGTGTGTtCCTTGCGTTCGCGA 

m 1 1 1 1 1 1 1 : 1 1 i 1 1 1 1 n 1 1 1 1 f 1 1 m 1 1 1 1 1 1 1 ii : 1 1 1 1 : i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

62 TTGTGTACGAGGCGGCCGATGCCATCCIGCACACGCCGGGGTGTGTCCCTTGCGTTCGCGA 

1 1 1 1 n 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n i nnnnnnnnnnnnnn 

6 2 TTGTGTACGAGGCGGCCGATGCCATCCTGCACgCGCCGGGGTGTGTCCCTTGCGTTCGCGA 

1 1 m m m 1 1 n 1 1 1 1 m n i n i n 1 1 i n nninninniimmi 

6 2 TTGTGTACGAGGCGGCCGATGCCATCCTGCAtTCTCCaGGGTGTGTCCCTTGCGTTCGCGA 

nniiiim iiiiiiii mini n inn nnnnnnnnnnm 

6 2 TTGTGTACGAGACGGCCGATaCCATCCTACACTCTCCgGGGTGTGTCCCTTGCGTTCGCGA 

f m ii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 mi minimi nnnnnnnnnnm 

62 TTGTGTACGAGACGGCCGATgCCATtCTACACTCTCCaGGGTGTGTCCCTTGCGTTCGCGA ' 
TtGTGTACGAGgCgGCcGATgCcATcCTgCAc - CtCCgGGgTGTGTcCCTTGCGTTCGcGA 


SEP ID NO: 


Isolate 

S14 

DK7 

US11 

DR4 

DRl 

DK9 

S18 

SWl 

consensus 


123 GGGTAACacCTCGAGGTGTTGGGTGGCGATGACCCCCACGGTGGCCACCAGGGAcGGCAAA 

I I I I I 1 I I I I I II I II I I I I II M I I I I I 1 I I II I I I I I I I I I I I II i i i i i i 


_ m 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 m 1 1 1 ii Mint 

123 GGGTAACG t CTCGAGGTGTTGGGTGGCGATGACCCCCACGGTGGCCACCAGGGAtGGCAAA 

, „ Nil 1 1 'J. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

123 GGGTAACGCtTCGAGGTGTTGGGTGGCGATGACCCCCACGGTGGCCACCAGGGACGGCAAA 

„ mini i in in in mmni nil in in in inn in nnnmn 

123 GGGTAACaCCTCGAGGTGTTGGGTGGCGGTGACCCCCACGGTGGCCACCAGGGACGGCAAA 
NN Nl I NNI I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 III 1 1 1 1 1 III 1 1 
123 GGGTAACGCCTCGAGGTGTIGGGTGGCGGIGACCCCCACGGTGGCCACCAGGGACGGCAAA 

Nil N N Ml II I N II III II 1 II II 1 II III III III III II III III III II I 

123 GGGTAACGCCTCGAaATGTTGGGTGGCGGTGGCCCCCACGGTGGCCACCAGGGACGGCAAo 

MM II III III 1 1 I II II! I III 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 

123 GGGTAACGCCTCGAgATGTTGGGTGcCGGTGGCCCCCACAGTCGCCACCAGGGACGGCAAA 

„ n i NJLJL m i in n 1 1 1 n in m in n m n n i in n n 1 1 in 

23 GGaTggCGCCcCGAagTGTTGGGTGgCGGTGGCCCCCACAGTcGCCACtAGGGACGGCAAA 
GGgTaaCgcctCGAggTGTTGGGTGgCGgTGaCCCCCACgGTgGCCACcAGGGAcGGCAAa 



FIGURE 1A 


Isolate 

S14 184 CTCCCCgCAaCGCAGCTTCGACGTtACATCGATCTGCTtGTCGGGAGcGCCACCCTCTGTT 

linn ii iiiiiiiiiiiiii iiiuiiiiim iniiin niiiniiiiii 

DK7 184 CTCCCCACAgCGCAGCTTCGACGTCACATCGATCTGCTcGTCGGGAGtGCCACCCTCTGTT 

lllllllll Mil 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II I II 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 II I 

USll 184 CTCCCCACAACGCAaCTTCGACGTCACATCGATCTGCTTGTCGGGAGCGCCACCCTCTGTT 

IIIIIIIIIIIIII II lllllllllillll I I I I I I II I I II II I I 1 l I I I I I II I I 

DR4 184 CTCCCCACAACGCAGCTcCGACGTCACATCGACCTGCTTGTCGGGAGCGCCACCCTCTGCT 

i Mini iii ii i iii i iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 

DRl 184 CTCCCCACAACGCAGCTTCGACGTCACATCGACCTGCTTGTCGGGAGCGCCACCCTCTGCT 

hum iii[iiiiiiiiiiiiiiiiiiiii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ii 1 1 

DK9 184 CTCCCCGCAACGCAGCTTCGACGTCACATCGATCTGCTTGTCGGGAGCGCCACCCTCTGCT 

I II I II I III II I I II Ml III IIIIIIIIMIMM III I I I 1 I I I I I I I I I I I I I I I I 

SI 8 184 CTCCCCGCAACGCAGCTTCGACGTCACATCGATCTGCTTGTtGGGAGCGCCACCCTCTGCT 

lllll I I I I I I I I I ! I I I M I I I I I I I I I I I I I I ! I I 3 I I II I III III II III III I 

SW1 184 CTCCCtGCAACGCAGCTTCGACGTCACATCGATCTGCTTGTcGGaAGCGCCACCCTCTGCT 

consensus CTCCCc- CAaCGCAgCTtCGACGTcACATCGAtCTGCTtGTcGGgAGcGCCACCCTCTGcT 

Isolate 

S14 24 5 CGGCCCTCTACGTGGGGGACtTGTGCGGGTCTGTCTTTCTTGTCGGTCAgCTGTTTACCTT 

llllll III III III Mill lllllllllllllimillllllllll llllllllill 

DK7 245 CGGCCCTCTACGTGGGGGACCTGTGCGGGTCTGTCTTTCTTGTCGGTCAACTGTTTACCTT 

I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Sll 245 CGGCCCTCTACGTGGGGGACCTGTGCGGGTCTGTCTTTCTTGTCGGTCAACTGTTTACCTT 

Ml III III III III II III llllllllllllllll Mill III III III III lllll 

DR4 245 CGGCCCTCTACGTGGGGGACtTGTGCGGGTCTGTCTTCCTTGTCGGTCAACTGTTCACCTr 

llllll II III Ml I II Ml lllllllllllllllllimillllllllllllllllll 

DRl 245 CGGCCCTCTACGTGGGGGACcTGTGCGGGTCTGTCTTCCTTGTCGGTCAACTGTTCACCTT 

I I I I I I I I I I lllllllll lllllllllllllllllllllllll IIIIIIIIIIIIII 

DK9 245 CGGCCCTCTATGTGGGGGACtTGTGCGGGTCTGTCTTCCTTGTCGGCCAACTGTTCACCTT 

II III II I II I II II I I I I I llllllllllllllll llllll INI lllllill I 

S 1 8 245 CGGCCCTCTATGTGGGGGACcTGTGCGGGTCTGTCTTTCTTGTCAGCCAgCTGTTCACtaT 

llllllllll lllllllll illimilllilllllll lllll II lllllill I 

SW1 245 CGGCCCTCTAcGTGGGGGACtTGTGCGGGTCTGTCTTTCTcGTCAGtCAaCTGTTCACgtT 

consensus CGGCCCTCTAcGTGGGGGAC - TGTGCGGGTCTGTCTTtCTtGTCgGtCAaCTGTTcACctT 


Isolate 

S14 

DK7 

Sll 

DR4 

DRl 

DK9 

S18 

SW1 


306 CTCTCCCAGGCGCC t CTGGACGACGCAAGaCTGCAATTGTTCTATCTATCC cGGCCATATA 

IIIIIIIIIIIIII IIIIIIIIIIIIII llllllllllllllllimi lllllllll 

306 CTCTCCCAGGCGCCACTGGACGACGCAAGGCTGCAATTGTTCTATCTATCCtGGCCATATA 

lllllllll lllllllllllllllll I I I I I I I I I II I I I I I I I I I I I I lllllllll 

306 CTCTCCCAGaCGCCACTGGACGACGCAgGGCTGCAATTGTTCTATCTATCCCGGCCATATA 

lllllllll I lllllllll lllll I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 


306 CTCTCCCAGGCaCCACTGGACAACGCAAGACTGCAATTGTTCcATCTATCCCGGCCATATA 

llllllllll llllllllllllllllllllllllllllll I 1 I I I I I I I I I I I I I I I I 


306 tTCTCCCAGGCGCCACTGGACAACGCAAGACTGCAATTGTTCTATCTATCCCGGCCATATA 

ii iiiii iiiiiiiiiiiiiiiiiiiiimii minimi uuiiiuu 

306 CTCCCCCAGaCGCCACTGGACAACGCAAGACTGCAACTGTTCTATCTACCCCGGCCATATt 

I I I I I I I I I 11111111111111111111111111111111111111111111111111 

306 CTCCCCCAGGCGCCACTGGACAACGCAAGACTGCAACTGTTCTATCTACCCCGGCCATATA 

uiiimi uumumumnum uuuuuui i mum iii 

306 CTCCCCCAGGCGCCACTGGACAACGCAAGACTGtAACTGTTCTATCTAtCCCGGCCAcATA 


1-8 


consensus 


cTCtCCCAGgCgCCaCTGGACaACGCAaGaCTGcAAtTGTTCtATCTAtCCcGGCCAtATa 



Isolate 

S14 367 ACGGGTCAtCGCATGGCaTGGGATATGATGATGAACTGGTCCCCTACgACGGCacTGGTAG 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I E I I I I I I lllll llllll 

DK7 367 ACGGGTCACCGCATGGCgTGGGATATGATGATGAACTGGTCCCCTACcACGGCGTTGGTAG 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I III lllllllllll III! II I M III I ! 1 1 1 1 1 1 1 1 1 1 I 

SIl 367 ACGGGTCACCGCATGGCaTGGGATATGATGATGAACTGGTCCCCTACGgCGGCGTTGGTgG 

mu tiiiiiiiiii mmimmimimiimiiii i m mi i 

DR4 367 ACGGGcCACCGCATGGCgTGGGATATGATGATGAACTGGTCCCCTACGACAGCGCTGGTAG 

mu mu mu mmmmmmmmmmmmmm 

DRl 367 ACGGGaCACCGtATGGCaTGGGATATGATGATGAACTGGTCCCCTACGACAGCGCTGGTAA 

urn ii ii mu mmmmmmmmmt; mmmm 

DK9 367 ACGGGTCAtCGcATGGCgTGGGATATGATGATGAACTGGTCCCCTACAgCAGCGCTGGTAA 

ilium ii uni mmmmmmmmmm i m mm 

S18 367 ACGGGTCACCGtATGGCATGGGATATGATGATGAACTGGTCCCCTACAACgGCGtTGGTAA 

iiiiiiiiiii mmimmimmiiiimmi! mu m mu 

SWl 367 ACGGGTCACCGcATGGCATGGGATATGATGATGAACTGGTCCCCcACAACaGCGcTGGTAg 

consensus ACGGGtCAcCGcATGGCaTGGGATATGATGATGAACTGGTCCCCtACgaC -GCgcTGGTag 


SEP ID NO: Isolate 

5 S14 

1 DK7 

8 Sll 

4 DR4 

3 DRl 

2 DK9 

6 S18 

7 SWl 

1-8 consensus 


428 TAGCTCAGCTGCTCCGGATCCCaCAAGCCATCTTGGAtATGATCGCTGGTGCTCACTGGGG 

I I 1 1 I 1 1 I 1 I 1 1 I 1 1 I I I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I llllll III III MMIIIIII 

428 TAGCTCAGCTGCTCCGGATCCCgCAAGCCATCTTGGACATGATCGCTGGTGCTCACTGGGG 
428 TAG ' CAGCTCCTCCTCATC ' ’ UJi C ^1'lllLLUJ. jjj ‘ 111111111111111,1 

428 Y^YTTtTffTffTTTTtTTTfTTI^TTTTTTTTTf^TtTTTTTTTTTfTT^TTfiTf?? 

4 28 TAG CTCAGCTG CTC CGGATCC CACAAG C CATCTTGGACATGATCGCTGGTG C C CACTGGGG 

I I I I I I I I I I I I I I I I I I I I I I I I 1 I 1 1 I 1 I I I I I l I I I I I 1 I I I I I I I II I I I I 1 I I I 

428 TGG CTCAGCTG CTCCGGATCCCACAAGCCATCTTGGACATGATCGCTGGaGCCCACTGGGG 

iiii i ii 1 1 1 1 1 1 miiii ii uuuuumuumm uuiiiuu 

428 TGGCgCAGCTGCTCAGGATCCCGCAgGCCATCTTGGACATGATCGCTGGTGCCCACTGGGG 

i ii uuiuiim miiii m m m mill m immimiim 

428 TAGCTCAGCTGCTCAGGgTCCCGCAAGCCGTCTTGGACATGATCGCTGGTGCCCACTGGGG 

I I I I I I f I I I I I I I I I I 1111111111111111111111111111111111111111111 

428 TAG CTCAGCTG CTCAGGaTCCCGCAAGCCGTCTTGGACATGATCGCTGGTGCCCACTGGGG 
TaGCtCAGCTGCTCcGGaTCCC - CAaGCCaTCTTGGAcATGATCGCTGGtGCcCACTGGGG 


Isolate 

S14 489 AGTCCTaGCGGGCATAGCGTATTTcTCCATGGTGGGaAACTGGGCGAAGGTCCTaGTgGTG 

mm i u 1 1 1 1 1 1 1 1 1 1 1 1 1 1 uuiiiuu uumumum u m 

DK7 4 89 AGTCCTgGCGGGCATAGCGTATTTtTCCATGGTGGGGAACTGGGCGAAGGTCCTGGTAGTG 

iiiiii uumumum uuuumuuummuuuuum 

Sll 489 AGTCCTAGCGGGCATAGCGTATTTCTCCATGGTGGGGAACTGGGCGAAGGTCCTGGTAGTG 

I I 1 1 I Ii 1 1 I 1 1 II I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I U 1 1 1 I 1 1 I 1 1 I 1 1 I 1 1 I 1 1 1 1 1 I I 1 1 1 

DR4 489 AGTCCTAGCGGGCATAGCGTATTTCTCCATGGTGGGGAACTGGGCGAAGGTCCTGGTAGTG 

1111111111111111111111111111111111111111111111111111 I I I I I I I I 

DRl 489 AGTCCTAGCGGGCATAGCGTATTTCTCCATGGTGGGGAACTGGGCGAAGGTCGTGGTAGTG 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I f I I I I II 

DK9 489 AGTCCTAGCGGGCATAGCGTATTTCTCCATGGTGGGGAACTGGGCGAAGGTCGTGGTgGTa 

ii in ii in m in in in uiiiiii ii uumumumu u i u 

S18 4 89 AGTCCTAGCGGG CATAGCGTATTTCTCCATGG cGGGGAACTGGGCGAAGGTC CTG cTAGTG 

I 1 I I 1 I I I II I I 1 I I II I II I II I I I I I I i I I I I II I 1 I I I I I I I I I I I I I f I I lllll 

SWl 489 AGTCCTAGCGGG CATAGCGTATTTCTCCATGGtGGGGAACTGGGCGAAGGTCCTGaTAGTG 


1-8 


consensus 


AGTCCTaGCGGGCATAGCGTATTTcTCCATGGtGGGgAACTGGGCGAAGGTCcTggTaGTg 



FIGURE 1A 


4 

3 

2 

6 


S14 

DK7 

US11 

DR4 

DR1 

DK9 

S18 

SW1 


550 CTGCTGCTATTcGCCGGCGTtGACGCG 

11111111111 illlllll llllll 

550 CTGCTGCTATTTGCCGGCGTCGACGCG 

IIIIMII III III llllll llllll I 

550 CTGCTGCTATTTGCCGGCGTCGACGCG 

III lilt lllllllllll II III 

550 CTGTTGCTGTTTGCCGGCGTTGATGCG 

I I I I I I I I 11 I I 1 I I I I I I I I II I I I I 

550 CTGTTGCTGTTTGCCGGCGTTGATGCG 

I I I I I 11 I II I I I I I II I I llllll 

550 CTGTTGCTGTTTaCCGGCGTCGATGCG 

I I I I I I I I I I I I I I I I I I I I I I I I I 1 

550 CTGTTGCTGTTTgCCGGCGTCGATGCG 

I I I I I I I I II I I I I I I I I I I I I I I I 1 

550 CTGTTGCTGTTTt CCGGCGTCGATGCG 


1-8 consensus 


CTGtTGCTgTTtgCCGGCGTcGAtGCG 



FIGURE IB 


1 TATGAAGTGCGCAACGTGTCCGGGgTGTACCAcGTCACaAACGACTGCTCCAACTCAAGCA 

I I I I I I I I I II I II I IN I I I I I I I I I M I I lllll MMMMMMMMMIM! 

1 TATGAAGTGCGCAACGTGTCCGGGaTGTACCAtGTCACgAACGACTGCTCCAACTCAAGCA 

I III III III llimilllMII lllllll lllll II lllll llllllll III! 

I TATGAAGTGCGCAACGTGTCCGGGGTGTACCAaGTCACcAAtGACTGTTCCAACTCGAGCA 

I III III III llllllllllllilllllllll lllll II INI llllll III I lllll 

1 TATGAAGTGCGCAACGTGTCCGGGGTGTACCATGTCACGAACGACTGTTCCAACTCGAGCA 

1 1 1 1 1 1 1 1 1 1 1 : 1 1 m 1 1 1 1 1 1 1 ii i mmiiimmiim mini m: 

1 TATGAAGTGCGCAACGTGTCCGGGGTATACCATGTCACGAACGACTGCTCCAACTtAAGCA 

I 1 1 1 1 1 1 1 1 1 1 1 1 I i I 1 1 1 1 1 I 1 1 III I II III I II 1 1 1 1 1 III 1 1 III I III I Mill 

1 TATGAAGTGCGCAACGTGTCCGGGATATACCATGTCACGAACGACTGCTCCAACTCAAGCA 

mi m mmimmiimiiiiii m mi m i mum m mini 

1 TATGAAGTGCGCAACGTGTCCGGGATATACCATGTCACGAACGACTGCTCCAACTCAAGCg 

ii mimiimiiiiiim i m mmmmmi imiiiiiiiii 

1 TAcGAAGTGCGCAACGTGTCCGGGGTGTACtATGTCACGAACGACTGTTCCAACTCAAGCA 

ii mmimmimiiiiiim mmmmmmmmmm 

1 TATGAAGTGCGCAACGTGTCCGGGGTGTAtCATGTCACGAACGACTGTTCCAACTCAAGCA 

Mill I I I I I I I I I I I II I 1 I I I I I I I I llllll II I llllllll I I I I I I I I I I I I 

1 TATGAgGTGCG CAACGTGTC CGGGGTGTACCATGTCACGAACGACTGCTCCAACTCAAGTA 

Mill I I I I I I I I M II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 11 I | I I | [ I I I I I 

1 TATGAAGTGCGCAACGTGTCCGGGGTGTACCATGTCACGAACGACTGCTCCAACTCAAGTA 

I III III III III milllllll II III III III III II llllll III IIIIIIMI I 

1 TATGAAGTGCGCAACGTGTCCGGGaTGTACCATGTCACGAACGACTGCTCCAACTCAAGCA 

i iii m m m m iiiiiiii mmimmimmimimiiiiiii 

1 TATGAAGTGCGCAACGTGTCCGGGgcGTACCATGTCACGAACGACTGCTCCAACTCAAGCA 

I ill IIIIIIMI milllllll 1 1 1 1 Ml 1 1 1 1 1 III I Ml 1 1 Ml III Ml 1 1 1 1 1 

1 TATGAAGTGCGCAACGTGTCCGGGATGTACCATGTCACGAACGACTGCTCCAACTCAAGCA 

I III I III I llllll llllllll I III III III III Ml III Ml I I III Ml I I I 

1 cATGAAGTGCaCAACGTaTCCGGGATcTACCATGTCACGAACGACTGCTCCAACTCAAGTA 

I Ml I I I I I llllll llllll I I I III III I II I III I I I I III Ml III I III I I 

1 TATGAAGTGCGCAACGTgTCCGGGGTGTACCATGTCACGAACGACTGCTCCAACTCAAGTA 

I III I III III I II I II lllllll I III III III III III I I III I III I Ml III III 

1 TATGAAGTGCGCAACGTaTCCGGGGcGTACCATGTCACGAACGACTGCTCCAACTCAAGTA 
tAtGAaGTGCgCAACGTgTCCGGGgtgTAccAtGTCACgAAcGACTGcTCCAACTcaAGca 



FIGURE IB 


SEP ID NO: 
11 

24 

10 

9 

14 

15 
12 
23 
22 

17 

16 
21 
20 
25 
13 

18 
19 


Isolate 

DK1 

T10 

D3 

D1 

HK5 

HK8 

HK3 

T3 

SW2 

IND8 

IND5 

SA10 

S45 

US 6 

HK4 

P10 

S9 


62 TcGTGTaTGAGGCAGtGGACgTGATCATGCAtACCCCaGGGTGCGTGCCCTGCGTTCGGGA 

I III! 1 1 1 I 1 1 I I nil I ! I 1 1 1 II I I Mill I I! 1 I I ! I I 1 1 I ! 1 I ; I 1 1 1 ! I I 

62 TtGTGTtTGAGGCAGCGGACtTGATCATGCACACCCCCGGGTGCGTGCCCTGCGTTCGGGA 


6 2 TcGTGTATGAGACAGCGGACATGATCATGCACACCCCCGGGTGCGTGCCCTGCGTTCGGGA 

I IIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIimillil 

62 TtGTGTATGAGACAGCGGACATGATCATGCACACCCCCGGGTGCGTGCCCTGCGTTCGGGA 

I Mill mill llllllllllllllllllllll II III ! II i I! ! II 111 I II II ! 

6 2 TCGTGTAcGAGACAaCGGACATGATCATGCACACCCCTGGGTGCGTGCCCTGCGTTCGGGA 

1 1 1 1 1 1 1 II III llllllllll Mill lllllll! Ill llllllllllllllll 

62 TCGTGTATGAaACAGCGGACATGATtATGCATACCCCTGGATGCaTGCCCTGCGTTCGGGA 

llllllllll Mill I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I HIM 

6 2 TCGTGTATGAGACAGCaGACATGATCATGCATACCCCTGGATGCGTGCCCTGCGTaCGGGA 

I I I I I I I I I I M I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I Mill 

6 2 TTGTGTATGAGACAGCGGACATGATCATGCAcACCCCTGGGTGCGTGCCCTGCGTTCGGGA 

I I I I I I I I II I I I I I I I I I I I I I I I M I I I I Mill I I I I I I I I I I I I I I I I I I I II I I 

62 TTGTGTATGAGACAGCGGACATGATCATGCAtACCCCCGGGTGCGTGCCCTGCGTTCGGGA 

I I I I I I I II I I I I I I I I I I I I I I I II I M I I I I I I I I I I I I I I I I 1 I I I I I I II I I I I I 

6 2 TTGTGTATGAGGCAGCGGACATGATCATGCACACCCCCGGGTGCGTGCCCTGCGTTCGGGA 

II I I I I I I I I I I I II I I I I I I I I I I II M I I I I I I I I I I I I I I I II I I I I I I I II I I I I I 

62 TTGTGTATGAGGCAGCGGACATGATCATGCACACtCCCGGGTGCGTGCCCTGCGTTCGGGA 

I I I I I II I I II I I I I II I I I I I E I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

6 2 TTGTGTATGAGGCAGCGGACATGATCATGCACACCCCCGGGTGCGTGCCCTGCGTTCGGGA 

I II 1 1 II 1 1 1 1 1 1 1 1 1111 Mill 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 

6 2 TTGTGTATGAGGCAGtGGACgTGATCcTGCACACCCCtGGGTGCGTGCCCTGCGTTCGGGA 

M I I I I I I I I I I I I I Mil Mill lllllll II I I I I II I I I I I I I I I I I I I I I I 

6 2 TTGTGTATGAGGCAGCGGACATGATCATGCACACtCCCGGGTGCGTGCCCTGtGTTCGGGA 

M II II II II II II II II II I II I II II II I II I I I I I I II I I I I I I I I I II Mill 

6 2 TTGTGTATGAGGCAGCGGACATGATCATGCAtACCCCCGGGTGCGTGCCCTGcGTcCGGGA 

I I II I I I I I I I I I I I I I I I I I I I I I Mill I I I I 1 I I II I I I I 1 I I I I I I II Mill 

6 2 TTGTGTATGAGGCAGCGGACATGATaATGCAcACCCCCGGGTGCGTGCCCTGtGTTCGGGA 

f lllllll 1 1 1 1 1 1 1 1 1 1 1 1 1111 Mill 1 1 1 1 1 1 1 1 1 1 1 II Mill MM III 

6 2 TTGTGTAcGAGGCAGCGGACgTGATcATGCAtACCCCCGGGTGtGTaCCCTGcGTTCaGGA 


9-25 


consensus 


TtGTGTatGAggCAgcgGACaTGATcaTGCAcACcCCcGGgTGcgTgCCCTGcGTtCgGGA 



11 

24 
10 

9 

14 

15 
12 
23 
22 

17 

16 
21 
20 

25 
13 

18 
19 


Isolate 

DK1 

T10 

D3 

D1 

HK5 

HK8 

HK3 

T3 

SW2 

IND8 

IND5 

SA10 

S45 

US6 

HK4 

P10 

S9 


123 

123 

123 

123 

123 

123 

123 

123 

123 

123 

123 

123 

123 

123 

123 

123 

123 


FIGURE IB 


GaaCAACcaCTCCCGtTGCTGGGTAGCGCTCACcCCCACGCTCGCGGCCAGGAACgCCAGd 

I Mil MIME lllllllllllllllll 1 1 1 ! I M I M I II I II I M 1 1 III!! 

GGgCAACTCCTCCCGCTGCTGGGTAGCGCTCACtCCCACGCTCGCGGCCAGGAACACCAGC 

II llillllll I II III III III III III II I I I II II I II I II I Mill I III! 

GGACAACTCCTCTCGCTGCTGGGTAGCGCTCACCCCCACGCTCGCGGCTAGGAATAGCAGC 

II II III IMI II I lllll III III llllll IN I III III I II I II I II I || | | hi | 

GGACAACTCCTCTCGCTGCTGGGTAGCGCTCACCCCCACGCTCGCGGCTAGGAATGGCAaC 

1 1 M 1 1 ii 1 1 M i! 1 1 1 1 1 1 1 1 1 1 1 1 iii iii ii i ii I ii 1 1 inn i ii i 

aAACAACTCCTCCCGTTGtTGGGTAGCGCTCgCCCCCACGCTCGCGGCcAGGAAcGcCAGC 

i iii iii mm mi mu mm i mmmmii nm i mm 

GAACAACTCCTCCCGTTGcTGGGTgGCGCTCACTCCCACGCTCGCGGCtAGGAAtGTCAGC 

1 1 ii 1 1 1 1 1 1 1 1 1 1 1 ii him mmmmmiimm mu mm 

GAACAACTCCTCCCGCTGtTGGGTAGCGCTCACTCCCACGCTCGCGGCCAGGAACGTCAGC 

II III MIMimil MMIMMM 1111111111111111111111111 Mil 

GAgCAAtTCCTCCCGCTGCTGGGTAGCGCTtACTCCCACGCTCGCGGCCAGGAACGCCAGC 

i iii mmmmmiimm imiiiiiii n iimiimi him 

GGcCAACTCCTCCCGCTGCTGGGTAGCGCTCACTCCCACGCTaGCaGCCAGGAACaCCAGC 

II III!! Ill I II Mil III III II Ml I II III II II II llllll lllll 

GGGCAACTtCTCTaGtTGCTGGGTAGCGCTCACTCCCACTCTCGCGGCtAGGAACGCCAGC 

IIIIIIH Mil I MMIIIII Ml MIMIMIMM! NMM 111111111111 

GGGCAACTCCTCTCGCTGCTGGGTAGCGCTCACTCCCACTCTCGCGGCCAGGAACGCCAGC 

I llillllll 11111111111111111111111111 I I II I I I I I I I I I I I lllll 

GAACAACTCCTCCCGCTGCTGGGTAGCGCTCACTCCCACGCTCGCGGCCAGGAACTCCAGC 

miimimm iimm mmmimmmmmiimi iiimimimmi 

GAACAACTCCTCCCGtTGCTGGGTgGCGCTCACTCCCACGCTCGCGGCCAGGAACTCCAGC 

mm iimm iimm mmimmimmiiiiiiim i mi 

GAACAAtTCCTCCCGcTGCTGGGTAGCGCTCACTCCCACGCTCGCGGCCAGGAACGCtAGC 

mm iimm mmimmiHimmiHimmimmi m 

GAACAACTCCTCCCGtTGCTGGGTAGCGCTCACTCCCACGCTCGCGGCCAGGAACGCCAGC 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 h 1 1 1 1 1 1 1 1 1 1 mum mu iiiii 

GAACAACTC CTC CCG cTGCTGGGTAGCGCTCACTCCCACaCTCGCGGC tAGGAAt t CCAGC 

i 1 1 1 1 1 1 1 1 1 1 iimm iimm iiiii iiimii iiiii i i i 

GggtAACTCCTCCCaaTGCTGGGTgGCGCTCACcCCCACgCTCGCGGCcAGGAAcgCtAcC 


9-25 


consensus 


gaacAAct cCTC ccg cTGcTGGGTaGCGCTcaCt CCCACgCTcGCgGCcAGGAAcgccAgC 



FIGURE IB 


Isolate 

DK1 


HK5 

HK8 

HK3 

T3 

SW2 

IND8 

IND5 

SA10 

S45 

US6 

HK4 

P10 

S3 

consensus 


184 aTCCCCACTACGACaATACGACGCCATGTCGATTTGCTCGTTGGGGCGGCTGCTTTCTGCT 

iiiiniiiiiii 1 1 1 1 1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 1 1 1 ; 1 1 1 i m i i i 1 1 1 1 i 1 1 1 i : 1 1 1 

184 GTCCCCACTACGACgATACGACGCCATGTCGATTTGCTCGTTGGGGCGGCTGCTTTCTGCT 

11111111111111 lllllllim I I I I I I II II I I II I M I I I ! I | i | ! | ! | | | | | | 

184 GTCCCCACTACGACaATACGACGCCACGTCGATTTGCTCGTTGGGGCGGCTGCTTTCTGCT 
184 GTCCCCACTACCK3C 1 1 1 « I I I I I I I I I I I I I I III I I I II I I I I 

i ii 1 1 1 1 1 ii i ii 9 ii iii 1 111 1 ii i ii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 nttuTfiTf nTff 

184 GTCCCCACcACGGCAATACGACGCCACGTCGACTTGCTCGTTGGGGCGGCTGCTTTCTGCT 

mum iii mmimmimmimmimmimiiimim 

184 GTCCCCACtACGACAATACGACGCCACGTCGACTTGCTCGTTGGGGCGGCTGCTTTCTGCT 

mum 1 1 1 1 1 1 ni r 1 1 1 1 11111111111111111111111111111 imm 

184 GTCCCCACcACGACAATACGACGTCACGTCGACTTGCTCGTTGGGGCGGCTGCcTTCTGCT 

1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 mu i 

184 GTCCCCACTAaGACAATACGACGTCACGTCGACTTGCTCGTTGGGGCGGCTGCTTTCTGtT 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 m 1 1 1 1 1 1 mum uumumumuimm 1 

184 GTCCCCACTACGACAATACGACGCCACGTCGATTTGCTCGTTGGGGCGGCTGCTTTCTGcT 

uiiiiii uumumumumumumumumumm 1 

184 GTCCCCACCACGACAATACGACGCCACGTCGATTTGCTCGTTGGGGCGGCTGCTTTCTGTT 

HI I III III II II III III I I III III III II III I III III III III III I III III 

184 GTCtCCACCACGACAATACGACaCCACGTCGATTTGCTCGTTGGGGCGGCTGCTTTCTGTT 

HI 1111 II I I I I II I I I I I II III II III III III III III III II III III II I I 

1 84 GTCCCCACTACGACAATACGACGCCACGTCGATTTGCTCGTTGGGGCGGCTGCTTTCTGCT 

, 1 III ill l III . 1,1 ill. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

184 GTCCCCACTACGACAATACGACGtCACGTCGA'tTIGCTCGTTGGGGCGGCTGCTTTCTGCT 

uumumumium 11 mmmiiimi 11111 iii 1 uiiuiu 

184 GTCCCCACTACGACAATACGACGCCACGTCGATTTGCTCGTTGGGGCGGCTaCTTTCTGCT 

uumumumiuum mu uuuuuuuuu uuiiiu 

1 84 aTCCCCACTACGACAATACGACGCCATGTCGAcTTGCTCGTTGGGGCGGCTGCTTTCTGCT 

mi iiiiii uumumumu uumumummumm 

1 84 GTCCCaACTACGgCAATACGACGCCATGTCGATTTGCTCGTTGGGGCGGCTGCTTTCTGCT 

JJJLiJ. U IJUL 'i 11111111 111 uumumumiuum mum 

184 GTC CC cAC cACGaCAATACGACG t CATGTCGATTTGCTCGTTGGGGCGGCTG t TTTCTG CT 
gTCcCcACtAcGaCaATACGACgcCAcGTCGAtTTGCTCGTTGGGGCGGCTgctTTCTGcT 
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245 CCG CTATGTAc GTGGGgGACCTCTG CGGATC cGTTTTCCTCGTCTCTCAGCTGTTCACCTT 

245 CCGCTATC^ 1 IM 1 Ml II I I I I I I I I III I I ! I I I I I I I I I I I I I 

45 ( j : Yff r ^YT t TT?TT a ?tcff c Tf???TTffTTTTT??T??T?Tf r ?TT?T?TT?t?f!T 

245 CCGCCATGTACGTGGGGGATCTtTGCGGATCTGTTTTCCTCGTCTCCCAGCTGTTCACCTT 

1 1 1 ! 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 i Mini III III III II! I 1 1 1 1 1 : 1 1 1 1 [ | M i 1 I 

245 CCGCCATGTACGTGGGGGATCTcTGCGGATCTGTTTTCCTCaTCTCCCAGCTGTTCACCcT 

245 CCGCTATCTiaST^^ 1 > > > I i I I I I I M t il I I I I I I I ] I I I 1. 1 ! f I I I I i I 

2 45 1 1 1 1 if 1 1 It i m 1 1 1 1 1 1 TTT^TTTtTTTTTTTTTTTTTTTTTTTtTTTTTTTtTTTT 
45 i i m iti i n i i i i i m i TTTTT??T?tTTTTTTTT?TT c TT?T??TT?frfT7??TTTT 

245 CCG CTATGTAC GTGGGGGATCT CTG CGGATCTG TTTTC CT t GTCTC C CAG CTGTTCAC CTT 


II 

245 CCGCTATGTACGTGGGGGATCTCTGCGGATCTGTTTTCCTCGTCTCCCAGCTGTTCACTTT 

III I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 

245 CCGtTATGTACGTGGGGGATCTCTGCGGATCTGTTTTCCTCGTCTCCCAGCTGTTCACTTT 

III I I I I I I I I I II I II I I I I I I I I I I I I I 11 I I I I I I I II III llllllllllll II 

245 CCGCTATGTACGTGGGGGATCTCTGCGGATCTGTTTTCCTtGTCTCCCAGCTGTTCACCTT 


245 CCGCTATGTACGTGGGGGATCTaTGCGGATCTGTTTTCCTcGTCTCCCAGCTGTTCACCTT 

INI 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II lllllllllllllllll II III II I Mill I II III I 

245 CCGCcATGTACGTGGGGGAcCTCTGCGGATCTGTTTTCCTTGTCTCCCAGCTGTTCACCTT 

Mil I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I 

245 CCGCTATGTACGTGGGGGAtCTCTGCGGATCTGTTTTCCTTGTtTCCCAGCTGTTCACCTT 


I I I I I I I I I I II I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I 

245 CCGCTATGTACGTGGGGGAcCTCTGCGGgTCcGTTTTCCTCaTCTCCCAGCTGTTCACCTT 

Mil 1 1 II I II I II I II 1 1 1 1 1 II I II II llllll 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

245 CCGCcATGTACGTGGGaGATCTCTGCGGATCTGTcTTCCTCGTCTCCCAGtTGTTCACCTT 

Mil I I II I II I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

245 CCGCTATGTACGTGGGGGATCTCTGCGGATCTGTTcTCCTCGTCTCCCAGCTGTTCACCTT 

M I II II I 11 II I II I II I II I I I I II I I I I I I Mill I I II M I I II I I I I I I I I 

245 CCGCTATGTACGTGGGGGAcCTgTGCGGATCTGTTtTCCTCaTCTCCCAGCTGTTCACCaT 
CCGctATGTAcGTGGGgGAtCTcTGCGGaTCtGTttTCCTcgTcTCcCAGcTGTTCACctT 
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306 tTCaCCTCGCCGGCATGAGACagcaCAGGACTGCAACTGCTCAATCTATCCCGGCCAcgTt 

II I I I I I I I I I II I I I I I I llllllllllllllllllllllllllllllll I 

306 CTCGCCTCGCCGG CATGAGACttTgCAGGACTGCAACTGCTCAATCTATCCCGGCCAtcTG 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I i I I I I I I I I I I II 

306 CTCGCCTCGCCGGCATGAGACaGTACAGGAaTGTAACTGCTCAATCTATCCCGGCCACGTG 

I II I I I I I I I I I I I I I I 1 I I I I I I I I I I I lllll I III I I I II I I III I I I I I I I II I 

306 CTCGCCTCGCCGGCATGAGACGGTACAGGAgTGTAAtTGCTCAATCTATCCCGGCCACGTG 

I I I I I I I I I I I I II llllllllllllll II II I I I I I I I I I I I I I I I I I I I I I I I 

306 CTCGCCTCGCCGACACGAGACGGTACAGGACTGCAACTGCTCAATCTATCCCGGCCACGT.fi 

I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 

306 tTCGCCTCGCCGACACGAGACGGTACAGGACTGCAACTGCTCAATCTATCCCGGCCACGTA 

I III III III III III II II I I I I I I I I I ! I I I I I I II I I I I llllll lllll III l! 

306 CTCGCCTCGCCGACACGAGACAGTACAGGACTGCAACTGCTCAcTCTATCCCGGCCACGTA 

1 1 1 1 1 1 1 1 1 1 1 1 ii 1 1 1 1 1 1 1 1 1 1 1 1 1 r i i 1 1 1 1 1 1 1 1 1 1 1 iiiii! mm mi: 

306 CTCGCCTCGCCGGCAtGAGACAGTACAGGACTGCAACTGCTCAATCTATCCCGGCCACGTA 

II I I I I I I I I I I I III III Mill I II llllll III II I I I I I I I I I I | | | | f | I | 

306 tTCACCTCGCCGGCAcGAGACAGTACAGGACTGCAACTGtTCCATCTATCCCGGCCACGTA 

I III I III III I I III I III III III III III I II III III I III III I III III I 

306 CTCACCGCGCCGGCATGAGACAGTACAGGACTGCAATTGCTCCATCTATCCCGGCCACGTA 

I III Ml III I III III III III III III III III III III III III III III III III I I 

306 CTCACCGCGCCGGCATGAGACAGTACAGGACTGCAATTGCTCCATCTATCCCGGCCACGTA 

m ii mm mmimmimiiimiiiii mmmmi mi 

306 CTCGCCTCGCCGG tATGAGACAGTACAGGACTGCAATTGCTCAATCTATCCCGGCCgCGTA 

I Ill m II III ill III MiiimiM ii M : 1 1 1 1 1 1 1 1 1 1 1 1 1 mi 

306 CTCGCCTCGTCGGCATGAGACAGTACAGGACTGCAAcTGTTCAATCTATCCCGGCCACGTA 

1 1 1 1 1 ii m 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 si 1 1 1 1 1 1 1 1 1 mi mi miMimmim 

306 CTCGCCTCGTCaGCATGAGACAGTACAGGACTGCAATTGTTCAATCTATCCCGGCCACGTA 

MMMMI I 1 1 1 1 1 1 1 1 1 I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 

306 CTCGCCTCGCCGG CATGAGACgGTACAGGACTGCAATTGcTCAATCTATCCCGGCCACGTA 

III I I I I I I I I I I II III I t I I I I I I I 1 I I I I I I I I I I I I I I I I I I lllllllll 

306 CTCaCCTCGCCGGCATtgGACAGTACAGGACTGCAATTGtTCAATCTATCCtGGCCACGTA 

III II II llllll I I I I I M I I I I I I I I I I I I I I I I I I I I I I II II lllll 

306 CTCgCCcCGtCGGCATgaGACAGTACAGaACTGCAATTGcTCAATCTATCCcGGaCACGTg 
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cTCgCCtCGcCggcAtgaGACagtaCAGgAcTGcAAcTGcTCaaTCTATCCcGGcCacgTa 
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consensus 


367 T CAGGTCAC CGCATGGCTTGGGAtATGATGATGAACTGGTCaC CTACAACAGC c CTAGTG C 

I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I lllllllllll Ml!!! 

367 T CAGGTCACCG CATGG CTTGGGAcATGATGATGAACTGGT C G C CTACAACAGC t CTAGTGG 

' 1 1 1 M I ! 1 1 ! ! ! ! 1 1 1 1 ! ! I ! ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 III! !!!!!!! 

3 67 ACAGGTCAC CGCATGGCTTGGGATATGATGATGAACTGGTCGCCTACAgCAGCC CTAGTGG 

iiiniiiiii iiiiiiiiiiiiiiiiiiiiiiiiiiiii linn iiiii mill 

367 ACAGGTCAC CG t-ATGGCTTGGGATATGATGATGAACTGGTCAC CTACAACAGCC tTAGTGG 

iiiiiiiiiii i!i!iiiiiiiMniiiiiiiiiiiiiimiii!iiiiii min 

3 67 ACAGGTCAC CGCATGGCTTGGGATATGATGATGAACTGGTCAC CTACAACAGCC CTAGTGG 

1 1 1 1 1 1 m i M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 m 1 1 m 1 1 1 1 ii I ii mmmmim 

367 TCAGGTCACCGCATGGCTTGGGATATGATGATGAACTGGTCgCCcACAACAGCCCTAGTGG 

mmimmimmimmimiiiiiiim n m mmmm 

367 TCAGGTCACCGCATGGCTTGGGATATGATGATGAACTGGTCcCCtACAgCAGCCCTAGTGG 

1 1 1 1 1 1 1 1 1 1 mmimmiimmimiiii n m i n imm 

367 a CAGGTCACCG tATGGCTTGGGATATGATGATGAACTGGTCgCCcACAaCqGCaCTAGTGG 

milllll! 1 1 II I M i 1 1 1 1 1 ! I II 1 1 II 1 1 1 1 1 1 1 11 III I II II Mil 

3 67 TCAGGTCACCGCATGGCTTGGGAcATGATGATGAACTGGTCACCTACAGCaGCCCTgGTGG 

1 1 1 1 ii ii i ii i ii 1 1 1 1 1 m 1 1 1 1 1 1 1 1 1 1 1 1 r 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 mil ini 

367 TCAGGTCACCGCATGGCTTGGGATATGATGATGAACTGGTCACCTACAGCgGCCCTAGTGG 

r 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 mmimmimmiimiiimi mmim 

367 TCAGGTCACCGCATGGCcTGGGATATGATGATGAACTGGTCACCTACAGCAGCCCTAGTGG 

mmmmim mmmmmmmmmm mi iiiii i 

367 ACAGGTCACCGCATGGCTTGGGATATGATGATGAACTGGTCACCTACAaCAGCtCTAGTaG 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n 1 1 m n m 1 1 m i mm mi nn i 

367 ACAGGTCACCGCATGGCTTGGGATATGATGATGAACTGGTCgCCTACAGCAGCCtTAGTGG 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 iiiii 1 1 1 1 1 1 1 1 1 1 1 1 mm 

367 TCAGGTCACCGCATGGCTTGGGATATGATGATGAAtTGGTCACCTACAGCAGCCCTAGTGG 

ii n iiMiiiiiii mnm mini iiiii mmnnmmmmm 

367 TCAGGTCACCGCATGGCTTGGGATATGATGATGAACTGGTCACCTACAGCAGCCCTAGTGG 

mmimmiiiiim nmmmmiimi n imimimiiii 

367 TCAGGTCACCGCATGGCTTGGGATATGATGATGAACTGGTCGCCcACAGCAGCCCTAGTGG 

niim nimii mm Mmmm m mm in nnnnnn 

3 67 aCAGGTCAtCGCATGGC cTGGGATATGATGATGAACTGGTCGCCtACAaCAGCCCTAGTGG 
t CAGGTCAcCG cATGGCtTGGGAtATGATGATGAAcTGGTCaCCtACAgCaGCccTaGTgg 
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428 TaTCGCAGTTACTCCGaATCCCACAAGCTGTCgTGGACATGGTGgCgGGGGCCCACTGGGG 

I 1 1 m 1 1 1 1 1 1 1 1 1 1 1 M i 1 1 1 1 1 1 1 M I M 1 1 1 1 1 1 1 1 1 I INI 1 1 1 1 1 1 1 1 1 1 

428 TgTCGCAGTTACTCCGGATCCCACAAGCTGTCaTGGACATGGTGaCaGGGGCCCACTGGGG 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I lllllllllll I I I I I I M I I I I I I I 

428 TATCGCAGTTACTCCGGATCCCACAAGCTGTCgTGGACATGGTGGCGGGGGCCCACTGGGG 

I I I I I I I I I I I I I I I M I I I I I I I I I I II I I I I I I I I I I I !! I I ! I I I I I | I | | | | | | | | 

428 TATCGCAGTTACTCCGGATCCCACAAGCTGTCaTGGACATGGTGGCGGGGGCCCACTGGGG 

I I M I I I I I I I I I I I I I I I I I I I I II I I I I llllllllll lillllMllllillll 

428 TGTCGCAGTTACTCCGGATCCCGCAAGCTGTCGTGGACATGGTaGCGGGGGCCCACTGGGG 
llllillllllllllllllllillillli I I I I I I I I I I f N lllllllllll II! ||! 
428 TGTCGCAGTTACTCCGGATCCCGCAAGCTaTCGTGGACATGGTGGCGGGGGCCCACTGGGG 
lllllll 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 III III IIMIIIIIIIIIIIIII! Ill III 
428 TGTCGCAaTTACTCCGGATCCCGCAAGCTGTCGTGGACATGGTGGCGGGGGCCCACTGGGG 

lllllll II lllllllllll I I I I I I I I I I I I I 1 I I II I I I I I I I I | I | | | | | | | I | | 

428 TGTCGCAGTTgCTCCGGATCCCACAAGCTGTCGTGGACATGGTGGCGGGGGCCCACTGGGG 
I I I I M I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I II I II I llllll II III 
428 TATCGCAGTTaCTCCGGATCCCACAAGCTGTCGTGGACATGGTaGCGGGGGCCCACTGGGG 

III!!! {Ill I IMIII I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

428 TATCGCAGTTGCTCCGGATCCCACAAGCTGTCGTGGATATGGTGGCGGGGGCCCACTGGGG 

I I I I I I I I 111 II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I III | | | | | | | | | | | I I I 
428 TATCGCAGTTGCTCCGGATCCCACAAGCTGTCGTGGATATGGTGGCGGGGGCCCACTGGGG 
IIMIIIIM I I I II I I I I I I I I I I I I I lllllll I I I I I I I I | II I I II I I I I I II I 
428 TATCGCAGTTACTCCGGATCCCACAAGCTaTCGTGGACATGGTGGCGGGGGCCCACTGGGG 


428 TATCGCAGTTACTCCGGATCCCACAAGCTGTCGTGGACATGGTGGCGGGGGCCCACTGGGG 

„ I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I M I | | | | | | | | | | | | | | | | 

428 TATCGCAGTTACTCCGGATCCCACAAGCTGTCATGGACATGGTGGCGGGGGCCCACTGGGG 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I llllllllll 

428 TATCGCAGTTACTCCGacTCCCACAAGCTGTCATGGACATGGTGGCGGGaGCCCACTGGGG 
I MINI lllllll lllllllllll || | III llllllllll lllllllllll 
428 TgrTCGCAGCTACTCCGGATCCCACAAGCTaTCtTGGATgTGGTGGCGGGGGCCCACTGGGG 
I I II I II I II I II I III II I II I II I II II I II I I I I I I I I I I I | | | | | | | | | INI 
428 TaTCGCAGCTACTCCGGATCCCACAAGCTgTCaTGGATaTGGTGGCGGGGGCCCACTGGGG 

TaTCGCAgtTaCTCCGgaTCCCaCTVAGCTgTCgTGGAcaTGGTggCgGGgGCCCACTGGGG 
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489 AGTCCTGGCGGGCCTcGCCTACTAcTCCATGGCGGGGAACTGGGCcAAGGTTTTAATTGTG 

tt i j 1 1 1 1 1 1 1 1 1 1 1 iiiiiiii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 i 1 1 1 1 1 1 i 1 1 1 

489 AGTCCTGGCGGGCCTtGCCTACTATTCCATGGCGGGGAACTGGGCTAAGGTTTTAATTGTG 
1 1 1 1 1 1 1 1 1 1 1 1 1 I II! II I II I II I M II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 | Mini 
489 GGTCCTGGCGGGCCTCGCCTACTATTCCATGGTGGGGAACTGGGCTAAGGTTTTGATTGTG 


489 GGTCCTGGCGGGCCTCGCCTACTATTCCATGGTGGGGAACTGGGCTAAGGTTTTGATTGTG 
II I II I Ml III III I I I I I II I I I II I I I I I I I I I I I I I I M II I I I I I I I M I I I I I 
489 GGTCCTGGCGGGCCTTGCCTACTATTCCATGGTGGGaAACTGGGCTAAGGTTTTGATTGTG 

HIM I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 

489 AGTCCTAGCGGGCCTTGCCTACTATTCCATGGTGGGcAACTGGGCTAAGGTTTTGATTGTG 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II 

489 AGTCCTAGCGGGCCTTGCCTACTATTCCATGGTGGGaAACTGGGCTAAGGTTTTGATTGTG 

IIMII I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I II I I I I I I I I I I I II I I II M 

489 AGTCCTGGCGGGCCTTGCCTACTATTCCATGGTGGGGAACTGGGCTAAGGTTTTGATTGTG 

I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I 

489 AGTCCTGGCGGGCCTTGCaTACTATTCCATGGTGGGGAACTGGGCTAAGGTTTTGATTGTG 

I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I II II 

489 AATCCTGGCGGGCCTTGCCTACTATTCCATGGTAGGGAACTGGGCTAAGGTTTTGATTGTG 


489 AATCCTGGCGGGCCTTGCCTACTATTCCATGGTAGGGAACTGGGCTAAGGTTTTGATTGTG 

I INI I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I i I I I I I I I I I II I I I I I I I I I 

489 AGTCCTaGCGGGCCTTGCCTACTATTCCATGGTGGGGAACTGGGCTAAGGTTTTGATTGTt 

Ml I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 

489 AGTCCTGGCGGGCCTTGCCTACTATTCCATGGTGGGGAACTGGGCTAAGGTTCTGATTGTG 

I I I I I I I I I I I I I I II I I I I I I I I I I I I II ! I I I I I I I I I I I I I I I I I I I I 1 I I I I I I 

489 AGTCCTGGCGGGCCTTGCCTACTATTCCATGGTGGGGAACTGGGCTAAGGTTCTGATTGTG 

llllll I I I I I I II I I I Mill I I II I I I I I I I I I I I I II llllll IIIIIIII 

489 AGTCCTaGCGGGCCTTGCtTACTATTCCATGGTGGGGAACTGGGCcAAGGTTTTGATTGTG 

llllll I I I I I I I II I I I I I I II I II I I I I II I II II I II I I I Mill I I II I I II I 

489 AGTCCTGGCGGGCCTTGCCTACTATTCCATGGTGGGGAACTGGGCTAAGGTcTTGATTGTG 

M II II II II II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 

489 AGTCCTGGCGGGCCTcGCCTACTATTCCATGGTGGGGAACTGGGCTAAGGTtTTGATTGTG 
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agTCCTgGCGGGCCTtGCcTACTAtTCCATGGtgGGgAACTGGGCtAAGGTttTgATTGTg 
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550 tTGCTACTCTTTGCCGGCGTTGATGGG 

1 1 1 1 I ! 1 1 ! I ! E 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 

550 ATGCTACTCTTTGCCGGCGTTGATGGG 

iimiiiiiiiii Mill II II 

550 ATGCTACTCTTTGCTGGCGTcGACGGC 

i! inn m iiiiiiiiii nun 

550 ATGCTACTCTTTGCTGGCGTTGACGGC 

1 I I I I I I I Mill llilllll II 

550 ATGCTACTtTTTGCCGGCGTTGATGGG 

llilllll llllllllllllllllll 

550 ATG CTACT gTTTGC CGG CGTTGATGGG 

llilllll llllllllllllllllll 

550 ATGCTACTtTTTGCCGGCGTTGATGGG 

Iflllll I I I I I I I II I I I I I I I I I 

550 cTGCTACTCTTTGCCGGCGTTGATGGG 

I I I I I I I I I I I I I llilllll III 

550 ATG CTACT CTTTGCtGGCGTTGACGGG 

I I II I I I I If I I I I I I I I I I I I I I I I 

550 ATG CTACT CTTTGCCGGCGTTGACGGG 

I I I I I I I I I I I I I I I I I I I I I I I I I I I 

550 ATG CTACT CTTTGCCGGCGTTGACGGG 

I I n 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n 1 1 1 n 

550 ATG CTACT CTTTGCCGGCGTTGACGGG 

I I I I 1 I I I I I I I I I I I I I I I I I I I I I I 

550 ATGCTACTCTTTGCCGGCGTTGACGGG 

II I I I I I I I I I I I I I I I I I I I I I I I I 

550 tTG CTACT CTTTGCCGGCGTTGACGGG 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 n 1 1 1 1 1 1 1 1 1 

550 ATGCTACTCTTTGCCGGCGTTGACGGG 

I I I 1 1 1 1 1 1 I II 1 1 1 ! I 1 1 1 1 1 1 1 1 1 

550 ATG CTACT CTTTGCCGGCGTTGACGGa 

iiiiini inn n nnnn 

550 ATGCTACTtTTTGCtGGtGTTGACGGg 
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aTG CTACT cTTTGCcGGcGTtGAcGGg 
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T2 

27 

T4 
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T9 

29 

US10 


1 GCc CAAGTI^gGAACAC CAg c C g C gG t TACATGG TGAC t-AAC GACTGTTC cAATGAg AG CAI 
1 GCa CAAGTGAAGAACAC CAc TAa CAG CTACATGG TGAC cAACGACTGTTC tJJtTQACAG CA 

ii iiimmmm n iiiiiiiiiniiii n iimn 11 miiii 

1 GCCgAAGTGAAGAACACCACSTACCAGCTACATCGTCACaAATGACTCTTCCAACGACAGCA 
I I 1 1 1 1 1 1 1 1 1 1 II I N I 1 1 1 1 1 1 1 1 I 1 1 1 I I I I lllllll! Illllllillitl 
1 GtCcAAGTGAAaAACACCAGTACCAGCTAtATGGTGACcAATGACTGcTCCAACGACAGCA 

GcccAAGTGAagAACACCAgtacCaGcTAcATGGTGACcAA-GACTGtTCcAA-GAcAGCA 


6 2 TCACcTGGCAGCTC CAaGCCGCGGTt CTCCACGTCCCCGGGTGTaTCCCGTG t GAGAqq c t 

INI 1 1 M I I 1 1 1 1 I llllllll III III 1IMII ! II IN lllllll III! 

62 TCACtTGGCAGCTCCAGGCCGCGGTCCTCCACGTCCCCGGGTGTGTCCCGTGCGAGAaAac 

INI Mill I I I I I I I I I I I I I I II i I I I I I I I I I | I | | | | | | | | | | | | | | | | | j 

62 TCACcTGGCAACTCCAGGCCGCGGTCCTCCACGTCCCCGGGTGcGTCCCGTGCGAGAqAGT 

Nil llllllll MU M I III III III II llllllll II III III III 1 1 III 

62 TCACtTGGCAACTtgAGGCtGCGGTCCTCCACGTtCCCGGGTGtGTCCCGTGCGAGAaAGT 

TCAC - TGGCA- CT c cAgGCcGCGGTcCTCCACGTcCCCGGGTGt gTCCCGTGcGAGA - agt 


123 GGGAAATACATCcCGaTGCTGGATACCGGTcaCACCAAACGTGGCCGTGCGGCAGCCCGGC 

III II Mill Hill Mil II III I! 1 1 Ill mill Mill I 

123 GGGAAATACATCtCGGTGCTGGATACCGGTtTCACCAAACGTGGCCGTGCGGCAGCCCGGC 
Mill I M 1 1 1 1 1 1! 1 1 1 1 1 1 1 1 M II llllllll II llll || III III 
123 tGGAAAcgCgTCgCGGTGCTGGATACCGGTCTCgCCAAACGTaGCtGTGCAGCGGCCTGGC 

I I I I I I II IN III III III I III III I III I I II II III ill III III III 

123 gGGAAAt aCaTCt CGGTGCTGGATACCGGTCTCaCCAAAtGTgGCcGTGCAGCGGCCTGGC 
gGGAAAt aCaTCtCGgTGCTGGATACCGGTct CaCCAAAcGTgGCcGTGC - GC - GCC - GGC 


184 GCtCTtACGCAGGGCTTGCGGACGCACATcGACATGGTTGTGATGTCCGCCACGCTCTGCT 

M ii mmimiiimmiiii mmmmmmmmimm 

184 GCCCTCACGCAGGGCTTG CGGACGCACATtGACATGGTTGTGATGTCCGCCACGCTCTGCT 

mmimi mi iiiii iii iii mi i mmmmmmmmimm 

184 GCCCTCACGCAGGGCTTGCGGACGCACATCGACATGGTTGTGATGTCCGCCACGCTCTGCT 
M I I I I I I I I I I I I I ! I I I I III I I I 1 I I I I I M I I | I II I I I I III I I I I I I I I IN I 
184 GCCCTCACGCAGGGCTTGCGGACtCACATCGACATGGTcGTGATGTCCGCCACGCTCTGCT 

GCcCTcACGCAGGGCTTGCGGACgCACATcGACATGGTtGTGATGTCCGCCACGCTCTGCT 


2 4 S CTGCcCTcTACGTGGGGGACCTCTGCGGCGGGGTGATGCTCGCAGCCCAGATGTTCAT t GT 

mi. 1 ii mmimmimmimmimmimiiiiiimm n 

245 CTGCTCTtTACGTGGGGGACCTCTGCGGCGGGGTGATGCTCGCAGCCCAGATGTTCATcGT 

, i mu 1 1 1 1 1 1 1 1 1 1 1 mmimmi niiiiii n mmimi i 

24 5 CCGCTCT cTACGTGGGGGAt CTCTGCGGCGGGGTaATGCTCGC cGC t CAGATGTTCATTaT 

INN 1 1 1 1 1 1 1 1 1 1 1 II lllllll III I NIIIIII II II llllll III I 

245 CCGCTCTtTACGTGGGGGActTCTGCGGtGGGaTgATGCTCGCaGCcCAaATGTTCATTgT 


C-GCtCT- 


TACGTGGGGGAc cTCTGCGGcGGGgTgATGCTCGCaGCcCAgATGTTCATt gT 
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306 CTCGCCGCgACgcCACTGGTTTGTGCAAGAaTGCAATTGCTCcATCTACCCcGGtACCATC 

, II III III Illlllllllllll I II I II I II 1 1 MM III! || mm 

306 CTCGCCGCAACAtCACTGGTTTGTGCAAGAcTGCAATTGCTC tATCTACCCTGGcACCATC 

I I I I I I I I I II Illlllllllllll II I I I I I I I I I | || | | | | | | | | | | | | | | 

306 CTCGCCGCAgCACCACTGGTTTGTGCAGGAATGCAACTGCTCCATtTACCCTGGTACCATC 

, iiiiiiii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 inn miiiiii 

306 CTCGCCGCgcCACCACTcGTTTGTGCAGGAATGCAACTGCTCCATcTACCCcGGTACCATC 

CTCGCCGC - a Ca c CACT gGTTTGTGCA - GAaTG CAA - TG CTC cATcTACC C - GGtACCATC 


367 ACTGGACACCGTATGGCATGGGAcATGATGATGAACTGGTCGCCCACaGCCACCATGATCC 
I NIMH 1 1 ! 1 1 1 1 ! i 1 1 1 1 1 1 ! 1 1 1 1 1 [ | 1 1 1 1 1 1 1 j| 1! I ! 

367 inm miTTnTimT?^ 

367 ACTGGACACCGTATGGCATGGGACATGATGATGAACTGGTCGCCCACaaCCACCATGATCt 

ii ii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 e 1 1 1 1 1 1 1 1 ii i mi mu 

367 ACcGGgCACCGTATGGCATGGGACATGATGATGAACTGGTCGCCCACggCCACttTGATCc 
ACtGGaCACCGTATGGCATGGGAcATGATGATGAACTGGTCGCCCAC - gCCACcaTGATCc 


428 TGGCGTACGCGATGCGCGTTCCCGAGGTCATCaTAGACATCaTcgGCGGGGCtCACTGGGG 

mi mill in iii in in inn mi i iiiiiiii i iiiimi nmimi 

428 TGGCGTACGCGATGCGCGTTCCCGAGGTCATC tTAGACATCgTtAGCGGGGCaCACTGGGG 

111111111111 1 1 1 1 1 ii 1 1 1 1 1 1 1 ii 1 1 1 1 i 1 1 if tin 

428 TGGCGTACGCGATGCGCGTTCCCGAGGTCATCATAGACATCATcAGCGGaGCtCACTGGGG 

iiifiii i JL mimimiii mmiii m in n n mu n h mu 

4 28 TGGCGTACG tGATGCGCGTTCCCGAGGTCATCATAGACATCATtAGCGGgGCgCAtTGGGG 
TGGCGTACGcGATGCGCGTTCCCGAGGTCATCaTAGACATCaT - aGCGGgGCtCAcTGGGG 


4 89 CGTCATGTTtGGCTTGGCCTACITCTCTATGCAGGGAGCGTGGGCGAAgGTCaTTGTCATC 
, QQ iiiiiii | i m III III III III I | III I III I III I | IIIIIIII I | | III 1111 | 
489 CGTCATGTTCGGCTTGGCCTACTTCTCTATGCAGGGAGCGTGGGCGAAaGTCGTTGTCATC 
i nun I I I I I I I I I I III III III III III III III III III | | m m IN III 
4 89 CGT^TGyrCTOCc^AGCCTACTTCTCTATGCAGGGAGCGTGGGCGAAgGTCGITGTCATC 
MM MIIIIII 1 1 II 1 1 1 1 1 I II 1 1 1 1 1 1 1 1 1 j 1 1 1 1 1 1 | M | 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
489 CGTCtTGTTCGGCtTAGCCTACTTCTCTATGCAGGGAGCGTGGGCGAAaGTCGTTGTCATC 

CGTCaTGTTcGGCtT - GCCTACTTCTCTATGCAGGGAGCGTGGGCGAA - GTCgTTGTCATC 


550 CTctTGCTGGCtGCTGGGGTGGACGCG 

M IIIIMI I Ml Ml III Ml 1 1 

550 CTtcTGCTGGCCGCTGGGGTGGACGCG 

ii mi mini ii i in in 

550 CTgtTGCTcaCCGCIGGcGTGGACGCG 

II Mil IMIIII 1 1 1 II 1 1 1 1 

550 CTtcTGCTagCCGCTGGgGTGGACGCG 
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CTt -TGCTggCcGCTGGgGTGGACGCG 
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1 

1 GTGGAAGT CAGGAACATCAGTTC cAG CTACT ACGC CAC CAATGATTG CTCAAACAACAG CA 

1 1 1 1 1 1 N 1 1 1 1 1 1 1 II 1 1 1 1 1 1 llllllll II I III II II! Ill 1 1! 1 1! !! Mill 

1 GTGGAAGTCAGGAACATCAGTTCTAGCTACTAtGCCACCAATGATTGCTCAAACAqCAGCA 

llllllllllllllll 1 1 1 1 1 1 1 1 1 Mill III MMIMMIMM Mill 

1 GTGGAAGTCAGGAACAcCAGTTCTAG tTACTAcGCCACCAATGATTGCTCAAACAaCAGCA 
GTGGAAGT cAG g AACA - CAGTT C tAG cTACTAcG C CAC CAATGATTG CTC aAACAa CAG CA 


6 2 TCACCTGGCAgCTCACCaACGCAGTTCTCCACCTTCCCGGATGCGTCCCATGTGAGAATGA 

llllllllll I I I M I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I II I I I I II I I 

6 2 TCACCTGGCAACTCACCgACGCAGTTCTCCACCTTCCCGGATGCGTCCCATGTGAGAATGA 

I I I I I I I I I I I I I I I I I M II I II I I II I II II I II I II I II I II II II I II I El II I 

6 2 TCACCTGGCAACTCACCAACGCAGTcCTCCACCTTCCCGGATGCGTCCCgTGTGAGAATGA 

„ * JLUJUvUiUiiiUJL 1 1 * 1 1 1 1 1 mi hum 

62 TCACCTGGCAACTCACCAACGCAGTtCTCCACCTTCCCGGATGCGTCCCaTGTGAGAATGA 
TCACCTGGCAaCTCACCaACGCAGTtCTCCACCTTCCCGGATGCGTCCCaTGTGAGAATGA 


123 CAATGGCACCtTGCGCTGCTGGATACAAGTaACACCTAATGTGGCTGTGAAACACCGtGGC 

llllllllll 1 1 1 II 1 1 II 1 1 Ml II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M M III 

123 CAATGGCACCCTGCGCTGCTGGATACAAGTGACACCTAATGTGGCTGTGAAACACCGCGGC 

I II I I I M II II I I I I I I I I I I I I I I I I I I I I II I M I I I I I I I I I I I I I I I I I I I I I I 

123 tAATGGCACCCTGCACTGCTGGATACAAGTGACACCTAATGTGGCTGTGAAACACCGCGGC 

I I I I I I I I I I I I I I I M I I I II I II I II I II I I II II II II I II II II I II I II II II II 

123 cAATGGCACCCTGCACTGCTGGATACAAGTGACACCTAATGTGGCTGTGAAACACCGCGGC 
cAATGGCACC cTGC - CTGCTGGATACAAGTgACACCTAATGTGGCTGTGAAACACCGcGGC 


184 GCACTcACTCAcAACCTGCGAACgCAtGTCGACGTGATCGTAATGGCAGCTACGGTCTGCT 

Mill mil MUM II 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 

184 GCACTtACTCAtAACCTGCGAACACACGTCGACGTGATCGTAATGGCAGCTACGGTCTGCT 
10 II MM 111 II 1111 Ml II MII Ml I II II II II II II II I II II II II II 1 1 
184 GCgCTCACTCACAACCTGCGAGCACACGTCGATATGATCGTAATGGCAGCTACGGTCTGCT 

„ M llllllll MMIMMIMM I II Ml MM II II II II I II I II I II I II II 

184 G C a CT CACT CACAAC CTG C GAG CACA t aT aGAT ATGAT t GTAATGG CAG CTACGGTCTGCT 
GCaCTcACTCAcAACCTGCGA- CaCA-gTcGA- - TGATcGTAATGGCAG CTACGGTCTGCT 


245 CGGCCTTGTATGTGGGgGACGTgTGCGGGGCCGTGATGATaGcGTCGCAGGCTtTCATAAT 

JUJLJLJLUvL 1 1111,11 Mill I I I I I I I I I I I I I I I I I I 1111 Ml I II Ml I III 

245 CGGCCTTGTATGTGGGAGACGTaTGCGGGGCCGTGATGATCGTGTCGCAGGCTcTCATAAT 

... M' lilllllllllll l III I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

24S CGGCCTTGTATGTCGGAGACaTGTGCGGGGCCGTGATGATCGTGTCGCAGGCnTCATAAT 

111! MMIMMIMM I II II II II II II II I II I II I II I II II II II I II II I I 

245 CGGCCTTGTATGTGGGAGACgTGTGCGGGGCCGTGATGATCGTGTCGCAGGCTTTCATAgT 
CGGCCTTGTATGTGGGaGACgTgTGCGGGGCCGTGATGATcGtGTCGCAGGCTtTCATAaT 
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306 ATCG C C aGAACG C CACAACTT cAC C CAGGAGTG CAACTGTT C CATCTAC CAAGGTCAT AT C 

linn imiiiiiiiiii iiiiiiiiiiiiiimmiiiiiiiiiiiiiiiiiii 

306 ATCGCCtGAACGCCACAACTTTACCCAGGAGTGCAACTGTTCCATCTACCAAGGTCATATC 

llllll M I I I I I I I I I 1 I I 1 I I 1 I I I I I II I I I I I I I I I 1 I I I | | I | | 1 f | | ] mi 

306 ATCGCCAGAACGCCACAACTTTACCCAAGAGTGCAACTGTTCCATCTACCAAGGTCqTATC 

1 1 1 1 1 1 1 1 1 1 1 mi mmimmimmimmiiiiiiimii in 

3 06 ATCGCCAGAACaCCACcACTTTACCCAAGAGTGCAACTGTTCCATCTACCAAGGTCacATC 
ATCGCCaGAACgCCACaACTTtACCCA-GAGTGCAACTGTTCCATCTACCAAGGTCatATC 


367 ACCGGCCACCGCATGGCATGGGACATGATGCTgAACTGGTCACCAACTCTcACCATGATCC 

mmimmimmimmiiiii mmimimm miiiiiiii 

367 ACCGGCCACCGCATGGCATGGGACATGATGCTAAACTGGTCACCAACTCTTACCATGATCC 

1 1 II 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 !! I 1 1 1 1 1 1 1 ! i J ! I 

367 ACCGGCCACCGCATGGCgTGGGACATGATGCTAAACTGGTCACCAACTCTTACCATGATCC 

Iimni! Ill HIM 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 r 1 1 1 1 1 1 1 miiiiiiii 

367 ACCGGCCACCGCATGGCaTGGGACATGATGCTtAACTGGTCACCAACTCTcACCATGATCC 
ACCGGCCACCGCATGGCaTGGGACATGATGCTaAACTGGTCACCAACTCT-ACCATGATCC 


428 TCGCCTAcGCtGCTCGTGTgCCTGAaCTAG tCCTt gAaGTTGTCTTCGGCGGCCATTGGGG 

„ mini n mum mu iiii iii i imiiiimimmim imim 

428 TCGCCTATGCCGCTCGTGTTCCTGAGCTAGcCCTccAgGTTGTCTTCGGCGGCCATTGGGG 

I M HI HI I M 1 II I I I II I M I II III 111 | I II I II I II 1 II SI I II II II II 

428 TtGCCTATGCCGCTCGTGTTCCTGAGCTAGTCCTTGAAGTTGTCTTCGGCGGCCATTGGGG 

„ i m 1 1 1 1 1 1 1 1 1 1 1 r 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ilium u ilium 

428 TcGCCTATGCCGC cCGTGTTCCTGAGCTAGTCCTTGAAGTcGTCTTCGGtGGtCATTGGGG 
TcGCCTAtGCcGCtCGTGTtCCTGAgCTAGtCCTtgAaGTtGTCTTCGGcGGcCATTGGGG 


489 CGTGGTGTTTGGCTTGGCCTATTTCTCCATGCAaGGAGCGTGGGCCAAAGTCATcGCCATC 

cQ in in 1 1 1 1 1 1 1 1 in in 1 1 1 m 1 1 m 1 1 mu m m mu in i min 

489 CGTGGTGTTTGGCTTGGCCTATTTCTCCATGCAgGGAGCGTGGGCCAAAGTCATTGCCATC 
Q IN I II I II I I 1 I I I I II I II II I II I II I II I 11111111111111 [ I I I I I | | | | [ | 
489 CGTGGTGTTTGGCTTGGCCTATTTCTCCATGCAaGGAGCGTGGGCCAAGGTCATTGCCATC 
OQ IN II II I II II II II I II I II I II I II I II I IIIIIIIIMIIIIIIIIIIIIIIIII 
489 tGTGGTGTTTGGCTTGGCCTATTTCTCCATGCAgGGAGCGTGGGCCAAGGTCATTGCCATC 

cGTGGTGTTTGGCTTGGCCTATTTCTCCATGCA - GGAGCGTGGGCCAA - GTCATtGCCATC 


550 CTCCT cCTTGTCGCAGGAGTGGAcGCA 

inn mmimimm in 

550 CTCCTt CTTGTCGCAGGAGTGGATGCA 

inn nnmmmmiiiii 

550 CTCCTg CTTGTCGCAGGAGTGGATGCA 

Hill Hill I I I I 1 I I I I I I 1 1 I 1 

550 CTCCTt CTTGTaGCAGGAGTGGATGCA 
CTCCTt CTTGTcGCAGGAGTGGAtGCA 
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1 tTAGAGTGGCGGAATGTGTCcGGCCTCTAcGTCCTTACCAACGACTGTtCCAATAGCAGT.fi 

ii in in m mi i in ii mu i mm mu imitiitin 

1 CTAGAGTGGCGGAATGTGTCTGGCCTCTATGTCCTTACCAACGACTGTcCCAATAGCAGT.fi 

IN III IN HI I I I I I III III III III I III III III III III III I I Mill | | 

1 CTAGAS^CG^TACGTCTGGCCTCTATGTCCTcACCAACGACTGTTCCAATAGCAGTA 

Ml I III M 1 1 M II I III 1 1 1 1 III 1 1 1 1 INI I I ill II III III 1 1 1 1 II II 1 1 1 1 

1 CTAGAGTGGCGGAATACGTCTGGCCTCTATaTCCTTACCAACGACTGTTCCAATAGCAGTA 

m m m m 1 1 m i m m m m m m m m m i m m i mm I 

1 CTAGAGTGGCGGAATACGTCTGGCCTCTATgTCCTTACCAACGACTGTTCCAATAGCAGTA 
cTAGAGTGGCGGAATacGTCtGGCCTCTAtgTCCTtACCAACGACTGTtCCAATAGCAGTA 
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6 2 TcGTGTATGAGGCCGATGACGTCATTCTGCACACACCTGGCTGTGTACCTTGTGTTCAGGA 

„ Mill I II 1 1 1 1 III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1| 1 1 1 

62 TTGTGTATGAGG C C GATGAC GT CATT CTG CACACAC CTGG CTGTGTAC CTTGTG TTCAGGA 


6 2 TTGTGTATGAGGCCGATGACGTtATT CTG CACACACCTGGCTGTGTACCTTGTG TTCAGGA 

„ mil 1 1 1 1 1 1 1 1 1 1 in 1 1 1 1 mm mm 1 1 mmmmmmmii 

62 TTGTCTATGAGGCCGATGACGTCATTCTGCACACACCCGGCTGTGTACCTTGTGTrCAGGA 

'Ll 1 1 1 1 1 1 1 1 1 1 1 1 1 1 m m m Ml III III III III III III III III | | mi Ml 

6 2 TTGTGTATGAGGCCGATGACGTCATTCTGCACACACCCGGCTGTGTACCTTGTGTTCAGGA 
TtGTGTATGAGGCCGATGACGTcATTCTGCACACACCtGGCTGTGTACCTTGTGTTCAGGA 
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1 2 3 CGGCAATACATCtACGTGCTGGACCTCaGTGACgCCTACAGTGGCAGTCAGGTACGTCGGA 

,, 1 1 1 1 1 H i 1 1 1 1 1 1 ii hum in i mu mm m mm mmmm 

123 C^C^TACATCCACGTGCTGGACCTCgGTGACACCTACAGTGGCAGTCAGCTACGTCGGA 

m mmimmiiiiiiii i mmmmmmmmii mm 

123 9^tAATACATCCACGTGCTG<^CCCC^TGACACCTACAGTGGCAGTCAGGTAtGTCGGA 

iii ii in i m i m m ii i ii i m ii i m ii ii mimiiiiiii mm 

123 cggcaatacatccacgtgctggaccccagtgacacctacggtggcagtcaggtacgtcgga 
1 1 1 1 1 HI HI Ml IN HI HI Ml III III III III III III III III I III III III 
123 CGGCAATACATCCAtGTGCTGGACCCCAGTGACACCTAOMTGGCAGTCAGGTACGTCGGA 

CGG cAATACATC cAcGTG CTGGAC C cC aGTGAC aC CTACaGTGG CAGTCAGGTAcGTCGGA 
1 8 4 GCAACCACCGCtTCGATACGCAGTCATGTGGACCTGcTAGTGGGCGCGGCCACGATGTGCT 

1 1 1 1 1 HI HI I I I I Ml III II III II II Ml II III III II III Ml II II III III 

II III II III I Ml I II III I II I III II III II II III III II II III I III III I 

184 GCAACCACCGCTTCGATACGCAGTCATGTGGACCTATTgGTGGGCGCGGCCACtATGTGCT 

m i m m i m 1 1 m m 1 1 m m iii m m i i m 1 1 m iii 1 1 ii ii ii 

1 8 4 G<^CCACCGCCTC(^TACGCAGTCATGTGGACCTATTAGTGGGCGCGGCCACGCTCTCCT 

i n. JLJLiiJLJLiJLJLJLJL ** 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 iii 1 1 1 1 1 1 iii 1 1 1 1 1 1 iii 1 1 ill 

184 GCAACCACCGCTTCGATACGCAGTCATGTGGACCTATTAGTGGGCGCGGCCACGCTGTGCT 
GCAACCACCGCtTCGATACGCAGTCATGTGGACCTatTaGTGGGCGCGGCCACgaTGTGCT 
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245 CTGCGCTCTACGTGGGtGATgTGTGTGGGGCCGTCTTCCTtGTGGGACAAGCCTTCACGTT 

HI I I I I II I I I I I I I I I I I II 1 I I I I I I Mil | IN M | m 

245 CTGCGCTCTACGTGGGcGATATGTGTGGGGCCGTCTTCCTCGTGGGACAAGCCTTCACGTT 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II Mill I III II 1 1! ! I! I IIMIIIIIIIIIMIIIIIIII 
245 CTGCGCTCTACGTGGGTGATATGTGTGGGGCCGTCTTTCTCGTGGGACAAGCCTTCACGTT 

MMI.U'Ji' 1111 1111 111 111 11111 mmiMii in in i ii i m i in 

24 5 CTGCGCTCTATGTGGGTGATATGTGTGGGGCCGTCTTTCTCGTGGGACAAGCCTTCACGTT 

I I I I I I I M I I I II I II I I I I I I 1 I I I I I I I I I I I I I I I I I 1 I I I I I | | | | | | ] | | | | | | | 

245 CTGCGCTCTATGTGGGTGATATGTGTGGGGCCGTCTTTCTCGTGGGACAAGCCTTCACGTT 
CTGCGCTCTAcGTGGGtGATaTGTGTGGGGCCGTCTTtCTcGTGGGACAAGCCTTCACGTT 
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306 CAGACCtCGTCGCCATCAAACaGTCCAGACCTGTAACTGCTCGCTGTACCCAGGCCAtCTT 
1 1 1 1 1 1 III III III II 1 1! IIIIIIIIIMIIIIIIIIIIIIIIIIigilllll III 
306 CAGACCgCGTCGCCATCAAACGGTCCAGACCTGTAACTGCTCGCTGTACCCAGGCCAcCTT 

mill in 

306 CAGACCTCGTCGCCATCAAACGGTCCAGACCTGTAACTGCTCGCTGTACCCAGGCCATCTT 

lllllllllllllllllllllllllllllllllllllllllllllimilMMIIIIII 

306 CAGACCTCGTCGCCATCAAACGGTCCAGACCTGTAACTGCTCGCTGTACCCAGGCCATCTT 
1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 I II I E I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I M I II 1 1 1 1 1 1 1 1 I 1 1 || 
306 CAGACCTCGTCGCCATCAAACGGTCCAGACCTGTAACTGCTCGCTGTACCCAGGCCATgTT 

CAGACCtCGTCGCCATCAAACgGTCCAGACCTGTAACTGCTCGCTGTACCCAGGCCAtcTT 


SEP ID NO: Isolate 

35 DK12 


S2 


S54 

S52 

consensus 


367 TCAGGACATCGAATGG CTTGGGATATGATGATGAATTGG TCCCCCGCt GTGGGTATGGTGG 

ii m i ii i ii i ii i ii mu iii mi imimmii 

367 TCAGGACATCGAATGG CTTGGGATATGATGATGAATTGGTCCCCCGCcGTGGGTATGGTGG 
_ M Ml MINI II Mi MM 11 MM II MM IM Ml MM MM 1111111111111 
367 TCAGGACATCGcATGGCTTGGGATATGATGATGAATTGGTCCCCCGCTGTGGGTATGGTGG 

-Ij-UiUI-llU-lUilU 1 1 1 1 ii 1 1 1 1 ii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

367 TCAGGACATCGAATGG CTTGGGAT ATGATGATGAATTGGT C C C C CG CTGTGGGTATGG TGG 

MMMMMMMMMIMIMIMIMIMMIMM MMMMMMIMIIMM 

367 TCAGGACATCGAATGG CTTGGGATATGATGATGAATTGGTCCCCCGCTGTGGGTATGG TGG 
TCAGGACATCGaATGGCTTGGGATATGATGATGAATTGGTCCCCCGCtGTGGGTATGGTGG 


Isolate 

DK12 


S2 

S54 

S52 

consensus 


428 TaGCGCACGTCCTGCG t cTGCCCCAGACCTTGTTCGACATAATAGCtGGGGCCCATTGGGG 
I III III III III I I I ill III III III III I I I III III III III III III III I I 
428 TGGCGCACGTCCTGCGglTGCCCCAGACCTTGTTCGACATAATAGCCGGGGCCCATTGGGG 

, niiiiiiii mu uuiuiuu mimimimimiuimi miimimi 

428 TGGCGCACGTtCTGCGtTTGCCCCAGACCgTGTTCGACATAATAGCCGGGGCCCATTGGGG 
„ UU M 1 I | HI II 111 I I I I I I III I I 1 I I I III I I III III III I III I I I I 
428 ‘IX^CGCACATCCTGCGATTGCCCCAGACCTTGTTTGACATACTGGCCGGGGCCCATTGGGG 
JvU 1 1 1 ' 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 I I 1 1 I I I I I I I I I I HI I III I III i III I II III 
428 TGGCGCACATCCTGCGATTGCCCCAGACCTTGTTTGACATACTGGCCGGGGCCCATTGGGG 

TgGCGCACgTcCTGCG - tTGCCCCAGACCtTGTTcGACATAaTaGCcGGGGCCCATTGGGG 



FIGURE IE 


SEQ ID NO: Isolate 

35 DK12 


489 CATCaTGGCgGGCCTAGCCTATTACTCCATGCAGGGCAACTGGGCCAAGGTCGCTATCATC 

1 1 1 1 1 1 1 1 II I I I i I I I I I I I I I I I I I I I I I I I I I I I | | | | | ! | | | | | | | | | | | | | ] | 

4 89 CATCITGGCaGGCCTAGCCTATTACTCCATGCAGGGG^CTOiMCCAAOTTCGCTATCATC 

AOQ 1 1 1 jjiy. J. J J. jJJJ I I I N I I I I I I I I I I I I I I I I I I I I I I I | | | | | | | | | | | | | | | | | 

489 CATCTTGGCGGGCCTAGCCTATTACTCCATGCAaGGCAACTGGGCCAAGGTCGCTATCATC 

I M I I I I I I I I I I I I I I I I II I I I || | | | | | I I I I I I I I j | | | | | | I I I I II I I I I I I 

489 CATCITGGCGGGCCTAGCCTATTATTCTATGCAGGGCAACTGGGCCAAGGTCGCTATCATC 

I I I I I I I I I I I I I I I I I 1 f I I I I I I I I I I I I I | | | | | | | | | | | | | | | III | | || 

4 8 9 CATCTTGGCGGGCCTAGCCTATTA'ITCTATGCAGGGCAACTGGGCCAAGGTCGCTATtgTC 

CATCtTGGCgGGCCTAGCCTATTAcTCcATGCAgGGCAACTGGGCCAAGGTCGCTATcaTC 


550 ATGGTTATGTTTTCAGGaGTCGATGC C 

I I II I II III III III I IIIIIIMI 

550 ATGGTTATGTTTTCAGGGGTCGATGCC 

I I I I I I I I I I I I I I I I I I I I I I I III 

550 ATGGTTATGTTTTCAGGGGTCGAcGCC 

III lllllllllllllllllll III 

550 ATGATTATGTTTTCAGGGGTCGATGCC 

II I II I I I I I I I I I I I I I I I I I I I I I I 

550 ATGATTATGTTTTCAGGGGTCGATGCC 
ATGgTTATGTTTTCAGGgGTCGAtGCC 



FIGURE IF 


43 


Z7 

42 


Z6 

42-43 

consensus (Z6) 

SEO ID NO: 

Isolate 

43 


Z7 

42 


Z6 

42-43 

consensus (Z6) 

SEO IE 

NO: 

Isolate 

43 


Z7 

42 


Z6 

42-43 

consensus (Z6) 

SEO ID 

NO: 

Isolate 

43 


Z7 

42 


Z6 

42-43 

consensus (Z6) 

SEO ID 

NO: 

Isolate 

43 


Z7 

42 


Z6 

42-43 i 

consensus (Z6) 

SEO ID 

NO: 

Isolate 

43 


Z7 

42 


Z6 

42-43 consensus (Z6) 

SEO ID 

NO: 

Isolate 

43 


Z7 

42 


Z6 


42-43 consensus (Z6) 


1 ?T^9T^T? a 9^ TCCCTCGGGCGTCTATCACATCACCAACGA CTGCCCGAACTCGAGCA 

i! mini 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 m 1 1 1 1 mmimiiiiiiiiiiiiimi 

1 GTtAACTATCGCAATGCCTCGGGCGTCTATCACGTCACCAACGACTGCCCGAACTCGAGCA 
GTtAACTATCgCAATGCCTCGGGCGTCTATCACgTCACCAACGACTGCCCGAACTCGAGCA 


T^TGTA^t^CC^CACCAGATCCTAGAGCTGG^GGGTCCTG-rACCCTGTGTGAGGGa 

11 I I I I 1 I I I M I 1 I I I 1 I I I III 1 II I I I I I 1 I I I I I I I I | I I I III | | | | | | I 

T AGTGTATGAGG C CGAACAC CAg ATCTTACAC CT CCCAGGGTG C t Tg CC CTGTGTGAGGG t 
TAgTGTATGAGGCCGAACACCAgATCtTACACCTCCCAGGGTGCtTgCCCTGTGTGAGGGt 


123 gGGGAACCAGTCACGCTGCTGGGTGGCCCTTACTCCCACCGTGGCGGcGcCTTATATCGGT 
HIM I I I I I I I I I I I I I I I I I I I I I I I I III III ! I III | | | | | | I | | | I I I I I I I 
123 tGGGAAtCAGTCACGCTGCTGGGTGGCCCTTACTCCCACCGTGGCGGtGtCTTATATCGGT 

tGGGAAtCAGTCACGCTGCTGGGTGGCCCTTACTCCCACCGTGGCGGtGtCTTATATCGGT 


1 84 GCaCCGCTTGAaTCCaTCCGGAGACATGTGGACCTGATGGTAGGCGCtGCTACaGTGTGCT 

I' IN NMI IN III 11111111111111 III II III HIM Mill II 1111 

184 GCTCCGCTTGAcTCC cTCCGGAGACATGTGGACCTGATGGTGGGCGCCGCTACTGTaTGCT 
GCtCCGCTTGAcTCCcTCCGGAGACATGTGGACCTGATGGTgGGCGCcGCTACtGTaTGCT 


245 CcGCtCTCTACaTTGGGGACCTGTGCGGTGGcGtATTtTTGGTTGGtCAGATGTTtTCTTT 
i iL ,IIMI IMI 11 Ml Mil Mil I Ml 1 1 Ml III II 1 1111 1 II II 
245 CtGCCCTCTACgTTGGAGAtCTGTGCGGTGGTGcATTCTTGGTTGGcCAGATGTTCTCCTT 

CtGCCCTCTACgTTGGaGAtCTGTGCGGTGGtGcATTCTTGGTTGGcCAGATGTTcTCcTT 


306 ? m F in' m iin immmm^^ 

306 CCAGCCGCGACGCCACTGGACTACGCAGGACTGCAATTGTTCtATCTACGCAGGGCATATC 
CCAGCCGCGACGCCACTGGACTACGCAGGACTGCAATTGTTCtATCTAcGCaGGGCAtaTc 


367 ACaGGCCACAGaATGGCATGGGACATGATGATGAACTGGAGTCCCACAACCACCtTGcrTCC 

, £ , IJLJJLJLJL 1 1 1 1 ii i ii mi i m 1 1 1 1 mm ii i mi ii i him in Mini ii i i 

367 ACgGGCCACAGgATGGCATGGGACATGATGATGAACTGGAGTCCCACAACCACCCTGcTt C 
ACgGGCCACAGgATGGCATGGGACATGATGATGAACTGGAGTCCCACAACCACCcTGcTtC 



SEP ID NO: 
43 


FIGURE IF 


42-43 consensus (Z6) 


4 2 8 J^^^^Jt^TC^GC^TCCCTAGCACTCTGGTgGACCTACTCaCTGGAGGGCACTGGGC 

1 1 1 II II I I f I I I I I I I I I I I I I I I I I II I f I i I I | | | | | | [ M| | | | | | | | [|| | | 

428 TCGCCCAGGTcATGAGGATCCCTAGCACTCTGGTaGAtCTACTCGCTGGAGGGCACTGGGG 
TCGCCCAGGTcATGAGGATCCCTAGCACTCTGGTaGAtCTACTCgCTGGAGGGCACTGGGG 


SEP ID NO: Isolate 


42-43 consensus (Z6) 


489 taTCCTTaTcGGGgTGGCaTACTTCtGCATGCAAGCTAATTGGGCCAAGGTCATtCTGGTC 

HIM I HI Nil Mini I 1 1 1 1 1 1 1 ! 1 1 1 1 1 i 1 1 1 : 1 1 Mill lillij 

489 CgTCCTTGTTGGGtTGGCGTACTTCAGtATGCAAGCTAATTGGGCCAAaGTCATCCTGGTC 
cgTCCTTgTtGGGtTGGCgTACTTCaGtATGCAAGCTAATTGGGCCAAaGTCATcCTGGTC 


SEP ID NO: Isolate 

43 Z7 

42 Z6 

42-43 consensus (Z6) 


550 

550 


CTTTTCCTCTaCGCTGGAGTTGATGCC 

I I I I I I I I I I I I I I I I I I I I I I I I I I 

CTTTTCCTCTTCGCTGGAGTTGATGCC 

CTTTTCCTCTtCGCTGGAGTTGATGCC 


»4*92_ 



SEP ID NO: Isolate 

45 SA1 

47 SA5 

49 SA7 

4 6 SA4 

50 SA13 

4 8 SA6 

45-50 consensus 


FIGURE 1G 

L ?T t ^?rt?^g^T???T9T?9 GGT rTAcCATGTCACCAATGA C TGCCCAAACTCcTCCA 
II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | | | | | | | | | I I I I I I I I I I 
L GTCCCCTACCGAAATGCCTCTGGGGTTTATC^TGTCACCAAlXlATTCCCC^JJiCTOTCcJ, 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | 1 | | | | | | || | | | | | || | | | | || [ | 

L fTCCCCTACCC^TG^ 

II M I I I I I I I I I | | | | | I I I I I I I I | | | | I I I I I I I I | | | | | | | | | | | | | | | 

- GTTCCCTACCGAAAcGCCTCTGGGGTTTATCATGTC»C(^I^ITGCCCaJJiCTOTCC^ 

I II 1 1 n 1 1 1 1 III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Tm I irnimTi M 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 

GTTCCtTACCGgAATGCCTCTGGGGTgTATCATGTtACCAAIxkATTHSCCcJJlACrCTTCCA 

GTtCCcTACCGaAAtGCCTCtGGGGTtTAtCATGTcACCAATGAtTGCCCaAACTCtTCCA 


SEP ID NO: 


SA5 

SA7 

SA4 

SA13 

SA6 

consensus 

Isolate 

SA1 

SA5 

SA7 

SA4 

SA13 

SA6 

consensus 


62 7^T?T^?^???r^Tt g ??T^T ct T?9^?? CACCTGGcTGCGTGCCC,rGTGTCA 3 GcA 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Mill 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I 

62 TAGTCTACC^<^TAACC^ 

,ii 1 1 it i 1 1 ii ill i i mi iii i 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 

6 2 T^TCTAt^<^CT^c^CCT(^TCCTG^CG(piCCTGGITGCGTGCCCTGTGTC^Ga^ 
1 1 1 I II 1 1 1 1 1 1 1 1 I II 1 1 1 1 1 | Mil II I II II I I II II II II II II M II M 
62 TAGTtTAC^TCCT^TAACCr^TCrrGCAtGCACCTGGTTGCGTGCCtTGTGTCA^^ 

_ I M II II II II II II II II II II II II II It II II I II II I II I II II I II II I 

62 T cG y^^CG^U^CT(^T(^CCr(^TCTTACACGCACCTX^ITGCGTGCCCTGTGTtAG3c!lA 
c I 1 1 M I II II II II II II II II II II II II II II II I II II II II II II II II I 
6 2 TaGTCTAtGAGGCTGATGACCTGATCcTACACGCACCTGGcTGCGTGCCCTGTGTccGGaA 

TaGTcTAcGAGGCTGAtaaCCTGATc - TgCAcGCACCTGGtTGCGTGCCcTGTGTcaggcA 
123 ^aT^TGT^TAOTTGCp^TCC^TCACCCCCACAcTGTCAGCCCCGAcCtTCGGA 

M II II I II II II II II II II II II II 1 1 II I II II II I MIMMIMM I Mill 
123 ^gr^TGT^GTAC^TCCT^TCC^^TCACCCCCACATTGTCAGCCCCGAACCTCGGA 

123 AaATAATGTCAGTAGGTGc!rr^TC(yAAT<^CCCCcicATTGT^ 

i 1 1 1 m 1 1 1 1 1 1 ii M M i ii M M ii ii ii ii ii I mTFTTT i m mT IT? m 

123 A ^T^TGTC^GT^GTGCTCK^TCCAAATCACCCCCACgTTGTCAGCCCCGAAt CTCGGA 
,,, I ll | MMIIII II II 1111 Ml I 1 1 1 II 1 1 1 1 II II 1111 II 1111 II III I 
123 (MgT^TGT^TATCTGCIX^TC^gATCACCCCCACACTGTCAGCCCCGAGCCTCGGA. 
, II MIMMIMM 1111 IIM II II MIMMIMM MMMMMMIIMM 
123 GGaTAATGTCAGTAGaTGCTGGGTtCAtATCACCCCCACACTaTCAGCCCCGAGCCTCGGA 

agaTAATGTCAGTAggTGCTGGGTcCAaATCACCCCCACa - TgTCAGCCCCGAaccTCGGA 


45-50 



FIGURE 1G 
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SEO ID 

NO: Isolate 

45 

SA1 

47 

SA5 

49 

SA7 

46 

SA4 

50 

SA13 

48 

SA6 


184 GCGGTCACGGCTCCTCTTCGGAGGGcCGTTGACTACTTAGCGGGAGGaGCTGCtCTCTGC'l 
HI 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 Mill IIIEIII 
184 GCGGTCACGGCTCCTCTTCGGAGGGtCGTTGACTACTTAGCGGGAGGGGCTGCCCTCTGCT 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 1 I I I I I ! I I I I | 1 | I | | | | | | | | | | | | | 

184 GCGGTCACGGCTCCTCTTCGGAGGGCCGTTGACTACcTAGCGGGAGGGGCTGCCCTCTGCT 

I I in in in i in i mini ii m 1 1 ii i in mi in 1 1 in 

184 GCGGTCACGGCTCCTCTTCGGAGGGCCGTTGACTACTTAGCGGGAGGGGCTGCCCTCTGCT 
I M I I I I I I N I I I I I I II I 1 I I I I I I 1 1 I I I III I I 1 I I I I I I I 1 I 1 1 I 1 1 I I I III 
1 84 GCGGTCACGGCTCCTCTTCGGAGGGCCGTTGACTACTTAGCGGGgGGGGCTGCCCTtTGCT 

1 1 1 1 1 j 1 1 j 1 1 1 1 1 1 1 1 ii 1 1 1 1 ii 1 1 m mii 

184 GCGGTCACGGCTCCTCTTCGGAGGGCCGTTGAtTACTTgGCGGGaGGGGCcGCCCTgTGCT 
GCGGTCACGGCTCCTCTTCGGAGGGcCGTTGAcTACtTaGCGGGaGGgGCCGCcCTcTGCT 


245 CCGCACTATACGTCGGcGACGCGTGCGGGGCAGTGTTtcTGGTAGGCCAAATGTTCACCTA 


consensus 


245 CCGCACTATACGTCGGGGACGCGTCCG(^CAGTCl4cTTX^TATOCCAAATC^C^CCrrA 
MU I I I I III I I III III III I II 11 I I I I I I I I I I 1 I I Ml III I I I III I I III 

245 CCGCgCTATACGTCGGGGACGCGTGCGGGGCAGTGTTTTTGGTAGGCCAgATGTTCAqCTA 

m i N in 1 1 tf^ftTfTTTTfTTTffTfTf i i t i }T ?Tt?? c ?1 L TtT?TT?tTTTf 

245 CCGCGTTATACGTCGGAGACGCGTGCGGGGCAGTGTTTTTGGTAGGt CAAATGTTCACCTA 

I I ii ii in mm 1 1 1 1 1 1 n mm 

245 CCGCGTTATACGTCGGAGACGtGTGCGGGGCAtTGTTTTTGGTAGGcCAAATGTTCACCTA 
CCGC - cTATACGTCGGgGACGcGTGCGGGGCAgTGTTttTGGTAGGcCAaATGTTCAcCTA 

306 TAGGCCTCGCCAGCATACcACaGTGCAGGACTGCAACTGTTCCATTTACAGtGGCCATATC 

, n fill III III 1111 MM II 1 1 1 1 1 1 1 1 1 1 1 1 1 III III Ml Ml I III III HIM I 

306 TAGGCCTCGCCAGCATACTACGGTGCAGGACTGCAACTGTTCCATTTACAGcGGCCATATC 

iiniiiiimmm i i i i i mi i i i mi mi mi i mi m mi mi mi mi mm 

306 TAGGCCTCGCCAGCACACTACGGTGCAGGACTGCAACTGTTCCATTTACAGTGGCCATATC 

III III Ml Ml Ml Ml Ml I Ml 1 1 Ml Ml 1 1 11 11 1 Ml Ml Ml Ml Ml 1 1 
306 TAGGCCTCGCCAGCACACTACGGTGCAaGACTGCAAtTGcTCtATTTACAGTGGCCATATC 

, n , ALL iJLliiiJL JLJL 1 1 1 11111 in in 1 1 if ii ii m m him i m 

306 TAGcCCTCGCCgGCATAaTgt tGTGCAGGACTGCAACTGtTCCATTTAGAGTGGCCAcATC 

IN j. II JL II I JL'M I II 1 1 1 Ml I III Ml I I III Ml III Ml I III Ml 

306 TAGgCCTCGCCaGCATgcTacgGTaCAGGACTGCAACTGcTCCATTTACAGTGGCCAtATC 
TAGgCCTCGCCaGCAtactacgGTgCAgGACTGCAAcTGtTCcATTTACAGtGGCCAtATC 

367 ACCGGC CAC CGgATGGC tTGGGACATGATGATGAATTGGTCACCTACGACAGCCTTG cTGA 

1 1 ii J. ii hi i mu i m m m m m m 1 1 m m m m i m 1 1 m 

367 ACCGGCCACCGAATGGCATGGGACATGATGATGAATTGGTCACCTACGACAGCCTTGGTGA 

I * ' JLJLJL 1 I N I III I I III III III I III I Ml I I I III Ml III Ml I I III Ml 

3 67 ACCGGCGACCGAATGGCATGGGACATGATGATGAATTGGTCACCTACGACAGCCTTGGTGA 

,,, iJLJljJL j. JLIJLJL JL 1 1 1 1 m m i m 1 1 1 1 m m m 1 1 m 1 1 1 1 1 1 1 1 if 1 1 m 

367 ACCTOCCACCGGATGGCATGGGACATGATGATGAATTGGTCACCTACGACgGCCTTGcTGA 

... 1 1 1 HI 1 1 1 1 1 1 1 1 III Hi III III Ml Ml Ml III III Ml Ml II II III III 

367 ACCGGCCACCGGATGGCATGGGACATGATGATGAATTGGTCACCTACaLACAGCtTTGGTGA 

H Nil I M I HI Ml Ml III III III Ml III Ml Ml III I Mill Ml Ml I 

367 ACtGGCCACCGGATGGCATGGGACATGATGATGAATTGGTCACCcgCgACAGCcTTGGTGA 
ACcGGCCACCGgATGGCaTGGGACATGATGATGAATTGGTCACCtaCgACaGCcTTGgTGA 


45-50 



FIGURE 1G 
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SA1 

SA5 

SA7 

SA4 

SA13 

SA6 

consensus 


428 T?T??“T a TTTft??TtT c 7TT?t 9 TfT?T?tT a ?t < nT?1 t T a TT?Tm?T‘nTrfTTT 
428 

48 

428 TGGCCCAGTTOCTACGGATTCCCCAGGTGGTCATCGACAT^TTGCCCM^rraf^JJJ' 

, m i ii i ii i M iii ii in 1 1 ii mm m 1 1 1 1 1 it 

428 TGGCCCAGTTGtTACGGATTCCCCAGGTGGTCATTGA^T^TOCCa^cr^t^JJ' 

Ml IN I II 1 1 1 1 1 1 1 1 1 1 1 1 III 1 1 1 1 1 1 1 1 1 1 M I mTn fltfm?? 

428 TGGCCCAaaTGcTACGGATTCCCCAGGTGGTCATTGACAT^ITGCCGO^gCCACrrGGGG 

TGGCCCAgtTGcTACGGATtCCCCAgGTGGTCATtGACATCATtGCCGGGGgCCACTGGGG 

4 89 ^T^7j^TT^?999 c 9® < j^y^ ( j7y^999^g < f*" c< f^' c ^^CTGGGCTAAGGTaGTGCTGGTt 
4 89 cLsTCTTxilTCGCCGtCG^TAciTCGCGTCAGCGGCTAAclvLiGCTAA ^ ' 1 111,1111 

i 1 1 jjji 1 1 * 1 1 * i 1 1 1 1 1 1 in i ??T??????ffttTTT???T1tT?TTTT??T??T? 

4 89 GGTCTTGTTCGCCGCCGCATATTTCGCGTCAGCGGCTAACTGGGCTAAGGTTr'Tr' 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 m T mmm 

4 89 7?T^7^TT t 9?9??9?<f^T^Ty^^^GT^GCTCCTAACTGGGCTAA(L3’ITaTaCTQ3TC 

, n _ II I I HI 1 1 II INI II I II | I II | || 1 1| III 1 1| Mini | MUM 

... 1 1 1 Mill 1 1 1 1 1 1 1 III I II I II Mil III 1 1 1 1 1 1 1 1 1 1 1 1 II I 

4 89 GGTCTITX3TTCGCCGCtGCATACTtCGCGTCGGCGGCTAACrrGOTCtJA(M , nX3TGCrG3TC 

GGTCTTGTTcGCCGccGCATAcTtcGCGTC - GCgGCtAACTGGGCtAAGGTtgTgCTGGTc 


550 CTGTT cCTGTTTGCGGGGGTCGATGGC 

« n I IM I Ml II I I I I M III I II I III 

550 CTGTTTCTGTTTGCGGGGGTCGATGGC 

«„ iiii ii ii mm ii i ii i ii ii i 

SSO TTGTTTCTGTrroCGGGGGTCGATGCC 

550 J U1 'Mill 1 

i i i i i i i i i i i n 


550 m 

„„ 1 1 1 1 1 ii mi m mTu iTTTT 

550 tTGTTTCTGTTTGCGGGGGTt GATGCC 
- TGTTtCTGTTTGCGGGGGTcGATGcC 


45-50 



FIGURE 1H 


SEP ID NO: 
30-33 
34 
26-29 
35-39 
9-25 
1-8 

40 
42-43 

44 

41 
45-50 

51 


Genotype 

(IV/2b) 

(2c) 

(III/2a) 

(V/3a) 

(Il/lb) 

(I/la) 

(4a) 

(4c) 

<4d) 

(4b) 

(5a) 

(6a) 


^I?^^? AG 9 AACAtCAG ’ rTctAGcTACT AcG c CACCAATGATTGCTCaAACAaCAGCA 
L GTGGAGGTCAAGGACACCGGCGACTCCTACATGCCGACCAACGATTGCTCCAACTCTAGTA 
L GcccAAGTGAagAACACCAgtacCaGcTAcATGGTGACcAAcGACTGtTCcAAtGAcAGCA 
. cTAGAGTGGCGGAATacGTCtGGCCTCTAtgTCCTtACCAACGACTGTtCCAATAGCAGTA 
. tAtGAaGTGCgCAACGTgTCCGGGgtgTAccAtGTCACgAAcGACTGcTCCAACTcaAGca 
. tACCAAGTgCGCAACTCcaCgGGgCTtTACCATGTcACCAATGAtTGCCCTAAcTCGAGtA 
. GAGCACTACCGGAATGCTTCGGGCATCTATCACATCACCAATGATTGTCCGAATTCCAGTA 
GTtAACTATCgCAATGCCTCGGGCGTCTATCACgTCACCAACGACTGCCCGAACTCGAGCA 
TACAACTATCGCAACAGCTCGGGTGTCTACCATGTCACCAACGATTGCCCGAACTCGAGCA 
GTGCACTACCGGAATGCTTCGGGCGTCTATCATGTCACCAATGATTGCCCTAACACCAGCA 
GTtCCcTACCGaAAtGCCTCtGGGGTtTAtCATGTcACCAATGAtTGCCCaAACTCtTCCA 

CTTAC CTACGG CAACT C CAGTGGGCTATAC CATCT GACAAATGATTGC C C CAACT C CAG CA 


1-51 consensus 


A 


TA AC AA GA TG C AA 


SEP ID NO: 
30-33 

34 

26-29 

35-39 

9-25 

1-8 

40 
42-43 

44 

41 
45-50 

51 

1-51 

SEP ID NO: 
30-33 

34 

26-29 

35-39 

9-25 

1-8 

40 
42-43 

44 

41 
45-50 

51 


Genotype 

(IV/2b) 

(2c) 

(III/2a) 

(V/3a) 

(Il/lb) 

(I/la) 

(4a) 

(4c) 

(4d) 

(4b) 

(5a) 

(6a) 

consensus 

Genotype 
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TC ^CCTG GCAaCTCACCaACGCAGTtCTCCACCTTCCCGGATGCGTCCCaTCTfiariaaTria 

TCGTTTGGCAGCTTGAAGGAGCAGTGCTTCATACTCCTGGATGCGTCCCTTGTGAGCGTAC 

TCACcTGGCAaCTccAgGCcGCGGTcCTCCACGTcCCCGGGTGtgTCCCGTGcGAGAaaqt 

TtGTGTATGAGGCCGATGACGTcATTCTGCACACACCtGGCTGTGTACCTTGTGTTCAGGA 

I^I5^^? 9 ^ A H^ gGACaTGATCaTGCAcACcCCcG Gg T GcgTgCCCTGcGTtCgGGA 

^°^ A S? AG9 ES? CcGATgCcA,rcCT 9 CAcaCc CCgGGgTGTGTcCCTTGCGTTCGcGA 

TAGTCTATGAAGCTGACCATCACATCCTACACTTGCCGGGGTGCGTACCCTGTGTGATGAC 

I^HlSI ATGAGGCCGAACACCA9ATCtTACACCTCCCAGGGTG CtTgCCCTGTGTGAGGGt 
J a ^I^ a I5^^ ccgattaccacatcttacacctcccgg<3atg cgttccttgcgtgaggga 
TAGTGTACGAGACGGAGCACCACATCATGCACTTGCCAGGGTGTGTCCCCTGTGTGCGGAC 
Z! a °Z£^5 AGGCTGAt:aaCCTGATctT 9 CAcGCA CCTGGtTGCGTGCCcTGTGTcaggcA 
TCGTG CTGGAGGCGGATGCTATGATCTTGCATTTGC CTGGATGCTTGCCTTGTGTGAGGGT 

T A T T CA CCGGTG T CC TO G 


123 cAATGGCACCcTGCgCTGCTGGATACAAGTgACACCTAATGTGGCTGTGAAACACCGcGGC 

^^ CG ^^ C ^ TGTTGGGTGCCGGrrGCCC CCAATCTCGCCATAAGTCAACCTGGC 


123 ggagaatacttctcgctgctgggtgcccitgacccccactgtggccgcgccctatcccaac 

aga I^I?J CAGTA9gTGCTGGGTcCAaATCAC CCCCACatTgTCAGCCCCGAaccTCGGA 

123 CGATGATCGGTCCACCTGTTGGCATGCTGTGACCCCCACCCTGGCCATACCAAATGCTTCC 


1-51 


consensus 


TG TGG 


T C CC A T C 
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184 GCaCTcACTCAcAACCTGCGAaCaCAtgTcGAcaTGATcGTAATGGCAGCTACGGTCTGCT 
1 84 GCTCTCACTAAGGGCCTGCGAGCACACATCGATATCATCGTGATGTCTGCTACGGTCTGTT 
1 84 GCcCTcACGCAGGGCTTGCGGACgCACATcGACATGGTtGTGATGTCCGCCACGCTCTGCT 
184 GCAACCACCGCtTCGATACGCAGTCATGTGGACCTatTaGTGGGCGCGGCCACgaTGTGCT 
184 gTCcCcACtAcGaCaATACGACg cCAcGTCGAtTTGCTCGTTGGGGCGGCTgctTTCTG cT 
1 84 CTCCCcgCAaCGCAgCTtCGACGTcACATCGAtCTGCTtGTcGGgAGcGCCACCCTCTGcT 
1 84 GCTCCGCTTGAGTCGTTCCGGCGACATGTGGACTTAATGGTAGGCGCGGCCACTTTGTGTT 
184 GCtCCGCTTGAcT CCcT CCGGAGACATGTGGACCTGATGGTgGGCGCcGCTACtGTaTGCT 
184 GCTCCGCTTGAGTCTTTGAGACGTCACGTGGATCTGATGGTGGGCGGCGCCACTCTCTGCT 
184 GCACCGTTAGAGTCCATGCGCAGGCATGTAGACCTGATGGTGGGTGCGGCTACTATGTGTT 
1 84 GCGGTCACGGCTCCTCTTCGGAGGGcCGTTGAcTACtTaGCGGGaGGgGCtGCcCTcTGCT 
184 ACGCCCGCAACGGGATTCCGCAGGCATGTGGATCTTCTTGCGGGCGCCGCAGTGGTTTGCT 


T GA 


T G 


GC 


T TG T 


245 CGGCCTTGTATGTGGGaGACgTgTGCGGGGCCGTGATGATcGtGTCGCAGGCTtTCATAaT 
245 CTGCCCTTTATGTGGGGGACGTGTGTGGCGCGCTGATGCTGGCCGCTCAGGTCGTCGTCGT 
245 CcGCtCTtTACGTGGGGGAccTCTGCGGcGGGgTgATGCTCGCaGCcCAgATGTTCATtgT 

245 CTGCG CT CTAcGTGGG t GAT aTGTGTGGGGCCGTCTTt CTcGTGGGACAAGC CTT CACGTT 

245 CCGctATGTAcGTGGGgGAtCTcTGCGGaTCtGTttTCCTcgTcTCcCAGcTGTTCACctT 
245 CGGCCCTCTAcGTGGGGGACtTGTGCGGGTCTGTCTTtCTtGTCgGtCAaCTGTTcACctT 
24 5 CTGCCCTCTATGTTGGGGACCTCTGCGGAGGTGCCTTCCTGATGGGGCAGATGATCACTTT 

24 5 ^Q^ CTCTACgTT ^ aGAtCTCTGCGGTGGtGcA ^ CTTGGTTGGcCAGAT G' r TcTCcTT 

245 CCGCCTTCTACATTGGAGATCTGTGTGGAGGCGTCTTCCTAGTGGGCCAGCTGTTCGACTT 
245 CCGCgcTATACGTCGGgGACGcGTGCGGGGCAgTGTTt tTGGTAGG cCAaATGTTCAcCTA 
24 5 CATCCCTGTACATCGGGGACCTGTGTGGCTCTCTCTTTTTGGCGGGACAACTATTCACCTT 

C TTATGGGA TG GG TT CA T 


306 ATCGCCaGAACgCCACaACTTtACCCAaGAGTGCAACTGTTCCATCTACCAAGGTCatATC 
306 GTCGCCACAACACCATACGTTTGTCCAGGAATGCAACTGTTCCATATACCCGGGCCGCATT 
306 CTCGCCGCaaCacCACTgGTTTGTGCAaGAaTGCAAtTGCTCcATcTACCCtGGtACCATC 
306 CAGACCtCGTCGCCATCAAACgGTCCAGACCTGTAACTGCTCGCTGTACCCAGGCCAtcTT 
306 cTCgCCtCGcCggcAtgaGACagtaCAGgAcTGcAAcTGcTCaaTCTATCCcGGcCacgTa 
306 cTCtCCCAGgCgCCaCTGGACaACGCAaGaCTGcAAtTGTTCtATCTAtCCcGGCCAtATa 
306 TCGGCCGCGTCGCCACTGGACCACGCAGGAGTGCAATTGTTCCATCTACACTGGCCATATC 
306 CCAGCCGCGACGCCACTGGACTACGCAGGACTGCAATTGTTCtATCTAcGCaGGGCAtaTc 
306 CCAACCTCGCCGCCACTGGACCACCCAAGACTGCAATTGTTCCATCTACACAGGACATATC 
306 CCGACCGCGCCGGCACTGGACCACCCAGGATTGCAACTGCTCCATCTATCCTGGTCACGTC 
306 TAGgCCTCGCCaGCAtactacgGTgCAgGACTGCAAcTGtTCcATTTACAGtGGCCAtATC 
306 TCAGCCCCGCCGTCATTGGACTGTGCAAGACTGCAACTGCTCCATCTATACAGGCCACGTC 
cc C CA TG AA TG TC T TA GG T 


367 ACCGGCCACCGCATGGCaTGGGACATGATGCTaAACTGGTCACCAACTCTtACCATGATCC 
367 ACGGGACACCGCATGGCTTGGGATATGATGATGAACTGGTCGCCCACTACCACCATGCTCC 
367 ACtGGaCACCGTATGGCATGGGAcATGATGATGAACTGGTCGCCCACggCCACcaTGATCc 


367 ACgGGCCACAGgATGGCATGGGACATGATGATGAACTGGAGTCCCACAACCACCcTGcTt C 
367 ACAGGACACAGAATGGCTTGGGACATGATGATGAATTGGAGCCCCACTGCGACGCTGGTCC 
367 TCGGGCCACAGGATGGCCTGGGACATGATGATGAACTGGAGCCCTACCAGCGCGCTGATTA 
367 ACcGGCCACCGgATGGCaTGGGACATGATGATGAATTGGTCACCtaCgACaGCcTTGqTGA 
367 ACCGGCCACAGGATGGCTTGGGACATGATGATGAACTGGTCACCCACAACCACTCTGGTCC 
C GG CA G ATGGC TGGGA ATGATG T AA TGG CC C T T 


*489* J 
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* I ® ^^^5HS5^S GTGTtCCTGAgCTAGtCCTt g AaGTt GTC T TCGGcGGcCA'rrGGGG 
428 TGGCGTACTTGGTGCGCATCCCGGAAGTCATCTTGGATATTGTTACAGGAGGTCATTGGGG 
tl ® I^ GG ^ AGGc ^ TCCGCGTOCCCGAGGTGA TCaTAGACATCaTtaGCGGgGCtCAcTGGGG 
428 TgGCGCACgTcCTGCGttTGCCCCAGACCtTGTTcGACATAaTaGCcGGGGCCCATTGGGG 
tl ! T a J GG ^g^CTCCGgaTCCCaCAAGCTgTCgTGGAcaTGGTggCgGGgGCCCACTGGGG 
til ^ aGG ^ GCTCCTC ^ aTCC CgC^GCCalpTGGAcATGATCGCrrcGtGCcCACTGGGG 
428 TCGCCCAGATCATGAGGGTCCCCACAGCCTTTCTCGACATGGTTGCCGGAGGCCACTGGGG 
428 TCGCCCAGGTcATGAGGATCCCTAGCACTCTGGTaGAtCTACTCgCTGGAGGGCACTGGGG 
tH ^ G ^^?II^^ T CCCAGGCGCCATGGTCGACCTGCrrIcAGGC^SS^ 
til 3^^^ GA ^ rrACGGATCCCCTCTATCCTAGGTGACTTGCTCACC GGGGGTCACTGGGG 
428 TGGCCCAgtTGcTACGGATtCCCCAgGTGGTCATtGACATCATtGCCGGGGgCCACTGGGG 
428 TATCTAGCATCTTGAGGGTACCTGAGATTTGTGCGAGTGTGATATTTGGTGGCCATTGGGG 

TC GTCC TTGGGCA TGGGG 


489 cGTGGTGTTTGGCTTGGCCTATTTCTCCATGCAgGGAGCGTGGGCCAAaGTCATtGCCATC 
489 TGTAATGTTTGGCCTCGCTTACTTCTCCATGCAGGGATCGTGGGCGAAGGTCATCGTTATC 
489 CGTCaTGTTcGGCtTaGCCTACTTCTCTATGCAGGGAGCGTGGGCGAAaGTCcjTTGTCATC 
489 CATCtTGGCgGGCCTAGCCTATTAcTCcATGCAgGGCAACTGGGCCAAGGTCGCTATcaTC 


til ^^ 9 5 G ^ GCTtGCcTACTAtTCCA TG G tg GG gAACTGGGCtAAGGTttTgATTGTg 
t ® * A ^ G ^ aGCGGGCATAGCGTA ”^cTCCATGGtGGGgAACTGGGCGAAGGTCcTggTaGTg 
489 CGTCCTCGCGGGCTTGGCGTACTTCAGCATGCAAGGCAATTGGGCCAAGGTAGTCCTGGTC 
til S^™ g lH^ t:TGGC g TACTOCaG ^ TCCAAGG TAATTGGGCCAAaGTCATcCTGGTC 
4 89 CATTCTGGTTGGCATAGCGTACTTCAGCATGCAAGCTAATTGGGCCAAGGTTATCCTGGTC 

4 89 AGT TiJ J i~ iGCTGGTCTAGCi u l~n_i-iCAGCATGCAGAGTAACTGGGCGAAGf:TraTr(- , TYy;T i r* 

til ^J^S^ CGCCG ^ TA ^ t ^ CGTCgGCgGC ^ C ^ C ^^g^?c 

4 89 GATACTACTAGCCGTTGCCTACTTTGGCATGGCTGGCAACTGGCTAAAAGTTCTGGCTGTT 


550 CTCCTtCTTGTcGCAGGAGTGGAtGCA 
550 CTCCTGCTGACTGCTGGGGTGGAGGCG 
550 CTt tTGCTggCcGCTGGgGTGGACGCG 
550 ATGgTTATGTTTTCAGGgGTCGA t GCC 
550 aTGCTACTcTTTGCcGGcGTtGAcGGg 
550 CTGtT GCTgTTtgCCGGCGTcGAtGCG 
550 CTTTTCCTCITTGCTGGGGTAGACGCC 
550 CTTITCCTCTtCG CTGGAGTTGATGCC 
550 CTGTTTCTCTTTGCTGGAGTCGACGCT 
550 CTATTC 


TTTGCCGGGGTCGAGGGA 
550 tTGTTt CTGTTTGCGGGGGTcGATGcC 
550 CTGTTCCTATTTGCAGGGGTTGAAGCA 


1-51 


C GG GT GA G 
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T?TT?fT?tT?m7?TTfffTmtTM : tffT?TTTTTTT? N TTfTT7^flTtfH?^ 

I III? m I mm i m f irnrn m 


I I i II I I I M IN Ml III | j I M | I I | III | I I | | j 

1 HQy^STCLYHVTNDCPNSSIWEAADAII^aPGO/PO/REGJU^RCWAOTPTVATRDGK 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 IIIIIMI! 

1 yqvrnssglyhvtndcpnssivyeaadailhspgcvpc^g^kc^avJJ»ivatri)gk 

i i i i i I i i i i i i i i i i i i i i i i i i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 \ rnmTf TTt m?f 

1 YQVRNSSGLYHVTNDCPNS S I VYETADAILHS PGC^PoJt^ dgimKCT^ 

I I I I I I I II I I I I I I I I I I I I I I | | | I j I I I | | | | | | | | | III | | | | | | | || | j 

1 YQVRNStGLYHVTNDCPNSSIVYETADtlMSPGC^ 

yQVRNStGLYHVTNDCPNSSIVYEaADaILH- PGCVPCVREgnasrCWVavtPTVATRDGK 


62 LPatQLRRyIDLLVGSATLCSALYVGDLCGSVFLVGpLFTFSPRRIWrrQdCNCSIYPGHI 
_ II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | | | | | | | | | | | || 
62 LPTaPLRRHIDLLVGSATIjCSALYVGDLCGSVFLVGQLFTFSPRRHWTTQGCNCSIYPGHI 


62 LPirQLRRHIDLLVGSATLCSALYVGDLCGSVFLVGOLFTFSPRRHWTTODCNCSTYPrHT 

HI I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I Ml I U 1 1 1 muT 1 1 IN 1 1 1 1 1 1 

® ^ ^ atqlerh I DLLVGSATLCSALYVGDLCGS VFLVGQLFTFS PRRHWTTQDCNC S I YPGH1 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

J I j I IJ I I I I 1 I I I I I I I I I I I I I I I I I II I I I | | | | | | | I | M || | | | | | | | | | | | || | 

62 i>PATQLRRHIDUjVGSATLCSALYVGDLCGSVFLVSQLFTiSPRRHWTTQDCNCSIYPGHI 

LP - tQLRRhlDLLVGSATLCSALYVGDLCGSVFLVgQLFTf SPRrhWTTQdCNCSIYPGHI 


123 

123 XAYF-SMVGlUJJc^^ 


123 I P?AI^MIAGAHWGVIAGIAYFSM<rGl^^W 

. I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | | | | | | | | | | | | | | I I I I I I 

123 

123 mTu rf TuTT? TT^ynf? 

123 TGH^WDMMMl^ 

123 TGH RMA WDMMMNWS PTrJivi J^llJAvPQ^^ 

TGHRMAWDMMMNWS PTtALVvAQLLRi PQAi LDMIAGAHWGVLAGIAYFSHvGNWAKVl vV 
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1 YEVRNVSGm YHVTNDCSNS S IVf EAaDl IMHTPGCVPCVREgNsSRCWVALTPTLAARNtS 

I I I I I I I I I I i I I I I I I I I I I II I 1111111111111 | I | | | | | | | | | | | l | | | 

1 YEVRNVSGvYHVTNDCSNS S I VYEAvDvIMHTPGCVPCVRENNhS RCWVALTPTLAARNAS 

II IMI llllllllllllllll I I II I II III I II! || | | | | | | | | j | | | | | | | | 

1 hEVhNVSG i YHVTNDCSNS S I VYEAADMIMHTPGCVPCVRENNS SRCWVALTPTLAARNAS 

ii mi iiiiiiiiiiiii[i]iiiiiiiiiiiiiiiiijiiiii:i:iiiiiiiii: 
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MINIM M I I I I I I I I 1 I I II I I I 1 I I 1 I I 1 I I I II I I I I I I I I I I II I i I I I I I 

1 YEVRNVSGVYHVTNDCSNS S I VYEAADMIMHTPGCVPCVREGNfSsCWVALTPTLAARNAS 

I I I II I I I I I I I I I I I I I I I I 1 I I I II I I I I II I I I I I I II I I I I I I I I II I I I I I I I I 

1 YEVRNVSGVYHVTNDCSNSSIVYEAADMEMHTPGCVPCVREGNSSRCWVALTPTLAARNAS 

I M I I II I I I I I I I I I I I I M I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I 

1 YEVRNVSGVYHVTNDCSNSSIVYETADMIMHTPGCVPCVREaNSSRCWVALTPTLAARNtS 

I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 

1 YEVRNVSGIYHVTNDCSNSSvVYETADMIMHTPGCVPCVRENNSSRCWVALTPTLAARNVS 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I 

1 YEVRNVSGIYHVTNDCSNSSIVYETADMIMHTPGCmPCVRENNSSRCWVALTPTTiAARNVS 

II I I I II I I I I II I I I II I I I I I I I I I M II I I Mill I I I I I II I I I I I I I I I I I 

1 YEVRNVSGVYqVTNDCSNSSIVYETADMIMHTPGCVPCVREdNSSRCWVAIiTPTLAARNsS 

I II I I I I I I I I I I I M I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 

1 YEVENVSGVYyVTNDCSNSSIVYETADMXMHTPGCVPCVREsNSSRCWVALTPTIiAARNAS 

M I I I I I I I I I I I I I I I llllll I I I I I I I I I I I I I II I I I I 1 I I I I I I I I I I M I 

1 YEVRNVSGVYHVTNDCSNISIVYETtDMIMHTPGCVPCVRENNSSRCWVALaPTLAARNAS 

II I I I I I I I I I I I I I I I Mill I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
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I II I I I II I II I II I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I 

1 YEVRNVSGm YHVTNDCSNS S I VYEAADMIMHTPGCVPCVRENNSSRCWVALTPTIiAARNSS 

I I I I I I I I II I I I I I I I I I I I I I II I II I I I I I I I I I I II II I I I I I I I I I I I I II I I I I 

1 YEVRNVSGVYHVTNDCSNSSIVYEAADMIMHTPGCVPCVRENNSSRCWVALTPTLAARNSS 

M I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 1 I 1 I I I I I I I I I I I I I I I 

1 YEVRNVSGVYHVTNDCSNSSIVYEtADMIMHTPGCVPCVREdNSSRCWVALTPTLAARNqn 

M I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I III I I I I I I I I I I I I I 

1 YEVRNVSGaYHVTNDCSNSSIVYEaADvIMHTPGCVPCVqEgNSSqCWVALTPTIiAARNat 
yEVrNVSGvYhVTNDCSNsSiVyEaaDmlmHTPGCvPCVrEnNsSrCWVALtPTLAARNas 
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62 vPTTTIRRHVDLLVGAAAFCSAMYVGDLCGSVFLVSQLFTFSPRRHETlQDCNCSIYPGHl 

l!!t!!l !!! !l!i!!l!!lllll III !!l!!!l!!l!!!!!! Ill ! Illlllillli 

62 I PTTTI RRHVDLLVGAAAFCSAMYVGDLCGSVFLVSQLFTFSPRRHETaQDCNCSI YPGHV 

I I ! I I I I I I I I I I I I I i I I I I I I I ! I t I I I I I I I I II I I I I I 1 I I I I I I I I ! I I I I I I I I 

62 IPTTTIRRHVDLLVGAAAFCSAMYVGDLCGSVFLVSQLFTFSPRRHETVQDCNCSIYPGHV 

I 1 1 1 1 1 1 1 ! 1 1 E 1 1 1 1 1111111111111111 llillllll 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

62 VPTTTIRRHVDLLVGAAtFCSAMYVGDLCGSVFLiSQLFTFSPRqHETVQDCNCSIYPGHV 

II III III I I! I I II I I I I I I I I I 1 M M I ! I I llillllll I I I I I I I I ! I I | I I I I 

6 2 VPTTTI RRHVDLLVGAAAFCSAMYVGDLCGSVFLVSQLFTFSPRRHETVQDCNCS I YPGHV 

I llll! 1 1 1 1 II I i 1 1 1 1 i 1 1 M i 1 1 M 1 1 1 1 1 1 i 1 1 1 1 M 1 1 M 1 1 1 1 1 1 i 1 1 1 i 1 1 1 

62 VsTTTIRhHVDLLVGAAAFCSAMYVGDLCGSVFLVSQLFTFSPRRHETVQDCNCSIYPGHV 

I Mill I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 

6 2 VPTTTI RRHVDLLVGAAAFCS vMYVGDLCGSVFLVSQLFTFSPRRHETVQDCNCS I YPGHV 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I il I I I I I I I I I I I MIM 

62 VPTTTI RRHVDLLVGAAAFCSAMYVGDLCGSVFLVSQLFTFSPRRHETVQDCNCS1YPGHV 

I I I I II I I 1! I I I I I I I I I I I I i I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I MIM 

62 VPTTTI RRHVDLLVGAAAFCSAMYVGDLCGSVFLVSQLFTFSPRRHETVQDCNCS I YPGHV 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I II I I I I I I I I I I I I I ! I I I I I 1 

62 VPTTTIRRHVDLLVGAAAFCSAMYVGDLCGSVFLVSQLFTFSPRRHETVQeCNCSIYPGHV 

III I I I I I I I I I I I I I I I II I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

62 VPTkTIRRHVDLLVGAAAFCSAMYVGDLCGSVFLVSQLFTFSPRRHETVQDCNCSIYPGHV 

III I I I I I II I I I I I I I I I I M I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 

62 VPTTalRRHVDLLVGAAAFCSAMYVGDLCGSVFLVSQLFTFSPRRHETVQDCNCSIYPGHV 

Mil I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I 

62 VPTTTIRRHVDLLVGAAAFCSAMYVGDLCGSVFLVSQLFTFSPRRHETVQDCNCSIYPGHV 

I I I I I I I I I I I I I I I I I I I I II I II I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I 

62 VPTTTIRRHVDIiLVGAAAFCSAMYVGDLCGSVFLVSQLFTFSPRRyETVQDCNCSIYPGrV 

MM II I I I I I I I I I II I I II I I I I II I I I I I I I IE I 1 I I I I I I I I I I I I II I I I I 

6 2 VPTTAI RRHVDLLVGAAAFCSAMYVGDLCGS VILVSQLFTFSPRRHwTVQDCNCS I YPGHV 

II I I I I I I I I I I I I I I I I I I II I II I I I I I I I I Mill Mill III I I I I I I I I I I 

6 2 VPTTAIRRHVDLLVGAAAFCSAMYVGDLCGS VFLISQLFT1 SPRRHETVQe CNCS I YPGHV 

Mil 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I llillllll 1 1 1 1 1 1 1 1 1 1 

6 2 VPTTt I RRHVDLLVGAAvFCSAMYVGDLCGS VFLI SQLFT iS PRRHETVQnCNCS I YPGHV 
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123 SGHRMAWDMMMNWSPTTALVvSQLLRI PQAVmDMVtGAHWGVLAGLAYYSMAGNWAKVL I 1 

ll!llllii:i!llllllll llllllllli !!! I !!! 1 1 ! 1 1 ! I ! 1 1 1 ' 1 1 ! 1 1 1 ! I 

123 SGHRMAWDMMMNWS PTTALV1 SQLLRI PQAVvDMVAGAHWGVLAGLAYYSMAGNWAKVLI’ 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Ml Mill Mil limililllllllllll 1 1 1 1 1 1 ! I 

123 SGHRMAWDMMMNWSPTAAL WSQLLRI PQAVMDMVAGAHWGVLAGLAYY SMVGNWAKVLI'' 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

123 S GHRMAWDMMMNW S PTAAL W S QLLRI P QAVMDMVAGAHWG VliAG LA YY SMVGNWAKVL I ^ 

1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1111111 !! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! I 

123 S GHRMAWDMMMNWS PTAAL WSQLLRI PQAWDMVAGAHWG I LAGLAYYSMVGNWAKVLL 

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiii:iiii!iiiiii:iiiii:iiii!i!iM 

123 SGHRMAWDMMMNWS PTAAL WSQLLRI PQAWDMVAGAHWG I LAGLAYYSMVGNWAKVLL 

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii i;iiiii:iiii!ii!ii 

123 SGHRMAWDMMMNWS PTAAL' WSQLLRI PQAVVDMVAGAHWGVLAGLAYYSMVGNWAKVLI^ 

I I I I I I I I I 1 I I I I 1 I I I I I 1 II I I I I I I I I I 1 1 I I I I I I I I I 1 I I I I I 1 1 I I 1 I I I I 1 I 

123 SGHRMAWDMMMNWSPTAAL WSQLLRI PQAWDMVAGAHWGVLAGLAYY SMVGNWAKVL I ^ 

II I II III IIMMI I I I I I I I I I 1 I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I ! M 

123 S GHRMAWDMMMNW S PT tALWSQLLRI PQAi VDMVAGAHWGVLAGLAYYSMVGNWAKVLL 

1 1 M in in iii i! iiiiiiiiiiiii iiMimiiinimiiiiMiimi 

123 TGHRMAWDMMMNWSPTaALWSQLLRI PQAWDMVAGAHWGVLAGLAYY SMVGNWAKVLL 

llllllllllllllll l!!!!l!!!!!!!!!l!l!! !!!!!! !!!!!!!!!!!!!! !!! 

123 TGHRMAWDMMMNWS PTTALWSQLLRI PQAWDMVAGAHWGVLAGLAYY SMVGNWAKVLL 

I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I 

123 TGHRMAWDMMMNWS PTTALWSQLLRI PQAWDMVAGAHWGVLAGLAYY SMVGNWAKVLI''- 

llllllllllllllll I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I 

123 TGHRMAWDMMMNWSPTaALWSQLLRI PQAWDMVAGAHWGVLAGLAYY SMVGNWAKVLI' 

llllllllllllllll IIIIIIIIIIIII I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

123 TGHRMAWDMMMNWS PT tALWSQLLRI PQAI VDMVAGAHWGVLAGLAYY SMVGNWAKVLI' 1 

lllllllllllllll llllllllllllll I !!!l!!l!! !!!!!!!! !!!!!!!!! 

12 3 s GHRMAWDMMMNWS PTaALWSQLLRI PQAI 1 D vVAGAHWGVLAGLAYY SMVGNWAKVLI! 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 IIIIIIIIIIIII I II! II! Ill I!! !!! !!!!!!!!!!! 

123 TGHRMAWDMMMNWS PTTALWSQLLRI PQAVMDMVAGAHWGVLAGLAYY SMVGNWAKVLI! 

II! Ill 111! Ill I II I III II! Ill HIM II! II! II! II! Ill I! ill II ! II! !!! 

12 3 TGHRMAWDMMMNWS PTTALWSQLLRI PQAVMDMVAGAHWGVLAGLAYY SMVGNWAKVL J\ 
S GHRMAWDMMMNWS PTaALVvSQLLRi PQAwDmVaGAHWG vLAGLAYYSMvGNWAKVLIl 
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1 AQVrNTsrgYMVTNDCSNeSITWQLQAAVLHVPGCiPCErlGNTSRCWIPVtPNVAVRQPG 

in ii iiiiiiiii him in im ii i ii mi niiiiiiii iiiimii 

1 AQVKNTtnSYMVTNDCSNDSITWQLQAAVLHVPGCVPCEktGNTSRCWIPVSPNVAVRQPG 

I INI IIIIMIIIIIIIIIIMI llllllllllll II 1 1 1 1 1 1 1 1 1 1 1 1 1 II 

1 AeVKNTSTSYMVTNDCSNDSITWQLQAAVLHVPGCVPCErVGNaSRCWIPVSPNVAVQRPG 

I I I I I! I I I I I I I I I I I I I I I I I Ml IIIIIIIII I III lllllllllllllllll 

1 vqVKNTSTSYMVTNDCSNDSITWQLeAAVLHVPGCVPCEkVGNtSRCWIPVSPNVAVQRPG 
aqVkNTstsYMVTNDCSNdSITWQLqAAVLHVPGCvPCE - vGNtSRCWIPVs'PNVAV- - PG 


I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I HIM iiiiiiiii 

ALTQGLRTHI DMWMS ATLCSALYVGDLCGGVMLAAQMF I VS PQHHWFVQdCNCS IYPGT 

I I I I I II I I I I I I I I I I I I I I I I I | I I I I I I I I I I I I I 9 I IIIIIIIII IIIIIIIII 

ALTQGLRTHI DMWMS ATLCSALYVGDLCGGVMLAAQMF I i S PQHHWFVQECNCS IYPGT 

llllillllllllllllllllllllll III llllllil II II lllllllllllll 

ALTQGLRTHIDMWMSATLCSALYVGDf CGGmMLAAQMF I vS PrHHs FVQECNCS IYPGTI 
ALTQG LRTH I DMWMS ATLCSALYVGD 1 CGG vMLAAQMF I vS P - hHwFVQeCNCS IYPGTI 


I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II 

TGHRMAWDMMMNWS PTATMI LAYAMRVPEVI lDIvSGAHWGVMFGLAYFSMQGAWAKVWI 

I I I I I I I I I I I I I I I I llllllllllllll II I I I I I I I I I I I I I I I II I I I I I I I I I 

TGHRMAWDMMMNWS PT t TMI LAYAMRVPEVT ID 1 1 SGAHWGVMFGLAYFSMQGAWAKVWT 

I I I I I I I I I I I I I I I I I llll llllllllllllllllll I I I I I I I II I I I I I I I I I 

TGHRMAWDMMMNWS PT aTl I LAY vMRVPEVT I D 1 1 S GAHWG VI FGLAYFSMQGAWAKVWI 
TGHRMAWDMMMNWS PTaTmlLAYaMRVPEVI i D I i sGAHWGVmFGLAYFSMQGAWAKVvVI 
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1 VEVRNtSSSYYATNDCSNnSITWQLTNAVLHLPGCVPCENDNGTLHCWIQVTPNVAVKHRG 

HU! Illlilllilli I i I I I I I I I I I I I I I I I I I I I I I i I | | | | | | | | | | | | | j | ! | 

1 VEVRNiSSSYYATNDCSNsSITWQLTNAVLHLPGCVPCENDNGTLHCWIQVTPNVAVKHRG 

HIM I llllllllll I I I I I I I I I I I I I I ! j I I j I I I I I | I | | | | | | | | | | | | | | | 

1 VEVRNtSfSYYATNDCSNNSITWQLTNAVLHLPGCVPCENDNGTLRCWIQVTPNVAVKHRG 

Mill I I I II II I I I I I I I I I I I I I I M I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 

1 VEVRNiSsSYYATNDCSNNSITWQLTdAVLHLPGCVPCENDNGTLRCWIQVTPNVAVKHRG 
VEVRN - SsSYYATNDCSNnS ITWQLTnAVLHLPGCVPCENDNGTL - CWI QVTPNVAVKHRG 


62 ALTHNLRAH i DMI VMAATVCS ALYVGD vCGAVMI VS QAF I vS P EhHhFTQE CNC S I Y OGh I 

II I II I II I I I I I I I I I I I I I I I I II 1 I I I I II I I I I I III I II II I I I I II M I 

62 ALTHNLRAHVDMIVMAATVCSALYVGDmCGAVMIVSQAFI ISPERHNFTQECNCSIYOGrl 

1 1 1 1 1 1 1 III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 llllll II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 I 

6 2 ALTHNLRTHVDVIVMAATVCSALYVGD VCGAVMI aSQAF 1 1 S PERHNFTQECNCS IYQGHI 

II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I M I I I I I I I I I I II I I I II 

6 2 ALTHNLRTHVDVIVMAATVCSALYVGD VCGAVMI vSQAl 1 1 S PERHNFTQECNCS IYQGHI 
ALTHNLR-HvD - I VMAATVCS ALYVGD vCGAVMI vSQAf I i SPErHnFTQECNCS IYQGhI 


123 

123 

123 

123 


TGHRMAWDMMLNWS PTLTMI LA Y AAR VP E LVLE WFGGHWG WFG LAY F S MQGAWAKVI AI 

i Mini Mini mi in in in n i in ii 1 1 ii in in nil in 1 1 1 1 1 1 1 ii n 

TGHRMAWDMMLNWS PTLTMI LAYAARVPELVLEWFGGHWGWFGLAYFSMQGAWAKVIAI 

I I 1 I I I I I I I I 1 I I II I I I I I I I I I I I I 1 I I I I 1 I I I II I II I I I I I I I II 1 I I I I I I I I I 

TGHRMAWDMMLNWS PTLTMI LAYAARVPELVLEWFGGHWGWFGLAYFSMQGAWAKVIAI 

M I II I I II I II I I II I II I II I II I II I I I I I I I I I I I I I I II I I II I I I I I I II I I I 

TGHRMAWDMMLNWS PTLTMI LAY AAR VP ELaLqWFGGHWGWFGLAYFSMQGAWAKVIAI 
TGHRMAWDMMLNWS PTLTMI LAYAARVP E L vLe WFGGHWGWFG LAY FSMQGAWAKVIAI 
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1 LEWRNVSGLYVLTNDCsNSSIVYEADDVILHTPGCVPCVQDGNTSTCWTSVTPTVAVRYVG 

II III II I III III !! Illllllllll I! Ill III III lllllllll!lllll!!ll!!l 

1 LEWRNVSGLYVLTNDCpNS S I VYEADD VI LHT PGCVP CVQDGNTSTCWT S VTPTVAVRYVG 

uni inmim iiiiiiiimiiiii imiiiiiiiiiiiii iiiiimiiii 

1 LEWRNTSGLYVLTNDCSNSSIVYEADDVTLHTPGCVPCVQDGNTSTCWTPVTPTVAVRYVG 

1 1 1 1 1 1 1 1 1 1 Iimmii INI III II Ml mill Ml I mi III IIMII 1 I III 

1 LEWRNTSGLYiLTNDCSNSS IVYEADDVI LHTPGCVPCVQDGNTSTCWTPVTPTVAVRYVG 

MIMIMII J f 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 

1 LEWRNTSGLYvLTNDCSNSS I VYEADDVILHTPGCVPCVQDGNTSmCWTPVTPTVAVRYVG 
LEWRNtSGLYvLTNDCBNSS I VYEADDVILHTPGCVPCVQDGNTStCWTp VTPTVAVRYVG 


62 ATTASIRSHVDLLVGAATMCSALYVGDvCGAVFLVGQAFTFRPRRHOTVQTCNCSLYPGHL 

1 1 I f I 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 ] I M I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 

6 2 ATTAS I RS HVD LL VGAATMC SAL YVGDMCGAVFLVGQAFT FRPRRHQTVOTCNC SLYPGHL 

1 1 1 1 1 ! 1 1 1 1 1 i II 1 1 1 1 II 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 II ! 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 

62 ATTAS I RS HVD LL VGAATMC SAL YVGDMCGAVFLVGQAFT FRPRRHQTVOTCNC SLYPGHL 

lllllilllllllillll I I I I I M I I I I I I I 1 I I I I I I I I I I I I I I I I I I I j I I I I | | I 

62 ATTAS I RS HVDLLVGAATLC SAL YVGDMCGAVFLVGQAFT FRPRRHQTVOTCNC SLYPGHL 

I Ml III III III III II Mil II I llllll III I II III I III III II I II III Ml II 

6 2 ATTAS I RS HVD LLVGAATLC S AL YVGDMCGAVFLVGQAFTFRPRRHQTVQTCNC S LYPGHv 
ATTAS I RS HVD LLVGAATmC SALYVGDmCGAVFLVGQAFTFRPRRHQTVQTCNC SLYPGH1 


123 SGHRMAWDMMMNWS PAVGMWAHVLRLPQTLFDI IAGAHWGImAGLAYYSMQGNWAKVAI I 

Mil IIIMI mini III MIIIM II Ml IMIMIMI I 1 1 1 1 r I J 1 1 1 1 1 1 1 II I 

123 SGHRMAWDMMMNWS PAVGMWAHVLRLPQTLFDI IAGAHWGILAGLAYYSMQGNWAKVAI1 

MMMMMMIM III MMIMMIM 1 1 M 1 1 1 1 M Ml I M III 1 1 IM 1 1 III I 

123 SGHRMAWDMMMNWS P AVGMWAHV LRL P QTv FD I IAGAHWGILAGLAYYSMQGNWAKVAI I 

„ ii 1 1 1 1 u i ii 1 1 1 1 n 1 1 1 1 1 1 linn in iiiiiiiiimmiimiiiii! 

123 SGHRMAWDMMMNWSPAVGMWAHILRAPQTLFDILAGAHWGILAGLAYYSMQGNWAKVAII 

Ml III MMIMMIM III IM MIMIIMI Ml III MMMMMMMIIMI I 

123 SGHRMAWDMMMNWS P AVGMWAH I LRL P QTL FD I LAGAHWG I LAG LAYYS MQGNWAKVA I v 
SGHRMAWDMMMNWS PAVGMWAH vLRLPQTl FD I iAGAHWG 1 1 AGLAYYS MQGNWAKVAI i 


184 MVMFSGVDA 

I II 

184 MVMFSGVDA 
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I Mlllll 
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lllllllll 
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1 VNYhNASGVYHiTNDCPNSSImYEAEHHILHLPGCVPCVReGNQSRCWVALTPTVAAPYIG 

III lllilll 111111111 Mill IIIIIEI Mil llllllllltllill III 

1 VNYRNASGVyHVTNDCPNSSIVYEAEHqlLHLPGClPCVRvGNQSRCWVALTPTVAvsYIG 
VNYrNASGVYHvTNDCPNSSIvYEAEHqlliHIjPGClPCVRvGNQSRCWVALTPTVAvsYIG 
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6 2 APLES iRRHVDLMVGAATVCSALYIGDLCGGVFLVGQMFSFQPRRHWTTODCNCS IYAGHV 

in i iiiiiiiiiiiiiiiiii min miiiiiiiiiiiiiiiiiiiiiiiii 

62 APLdSLRRHVDLMVGAATVCSALYvGDLCGGaFLVGQMFSFQPRRHWTTQDCNCSIYAGHI 
APLdS IRRHVDLMVGAATVCSALYvGDLCGGaFLVGQMFSFQPRRHWTTQDCNCS IYAGHi 


SEP ID NO: Isolate 

94 Z7 123 TGHRMAWDMMMNWS PTTTLvLAQVMRI PSTLVDLLTGGHWG i Li G vAYFcMQANWAKVILV 

Mil II III II Ml Mill I 1 I I II I I I I I I I I I Mill I I Ml I I 1 I I I I I I I I 

93 Z6 123 TGHRMAWDMMMNW S PTTTL 1 IiAQVMRI P STLVDLLAGGHWG vLVG 1AYFSMQANWAKVI LV 

93-94 consensus (26 ) TGHRMAWDMMMNWSPTTTLlLAQVMRIPSTLVDLLaGGHWGvLvGlAYFsMQANWAKVILV 
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1 VPYRNASGVYHVTNDCPNSS IVYEADNLILHAPGCVPCVkegNVSRCWVQITPTLSAPNLG 

I I i I I I I I I I I I I I I I i I I I ! I I I I I I I ! I I i I ! I I I I I llillllllllllllllll 

1 VPYRNASGVYHVTNDCPNSSIVYEADNLILHAPGCVPCVRQnNVSRCWVQITPTLSAPNLG 

! 1 1 1 1 f I I ! 1 1 1 1 i S I i I i 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 ! II It 1 1 1 III 1 1 1 1 1 II 1 1 1 II 1 1 I 

1 VPYRNASGVYHVTNDCPNSS IVYEADNLILHAPGCVPCVRQDNVS kCWVQITPTLSAPNLG 

llllllllllllllllllllllllll III II I I I I I! Ill llll I I II I I I I i I I I I 

1 VPYRNASGVYHVTNDCPNSSIVYEADsLILHAPGCVPCVRQDNVSRCWVQITPTLSAPtfG 

I I I II I I I I I I E I I I I I I I I I I I I I I II III llll III I I I I I I I I I I I I I I I I I I 

1 VP YRNASGVYHVTNDCPNS S I VYEADDLILHAPGCVPCVRkDNVSRCWVhITPTLSAPS LG 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I lllllll lllllllllll 

1 VP YRNASGVYHVTNDCPNS S I VYEADDL I LHAPGCVP CVRqgNVS RCWVql TPTLS AP S LG 
VPYRNASGVYHVTNDCPNSS I VYEADnLILHAPGCVPCVrqdNVSrCWVqlTPTLSAPnlG 
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62 AVTAP LRRvVD YLAGGAALC S ALYVGDACGAVFLVGQMF t YRPRQHTTVQD CNC S I YS GH I 

I I I I I I II III III III llllll III! lllllll III! I I I I I I I I I I I I I I | | | | | | | 

62 AVTAP LRRAVD YLAGGAALC S ALYVGDACGAVFLVGQMF s YRPRQHTTVQD CNC S I YSGH I 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I lllllllllllllllllllll 

6 2 AVTAP LRRAVD YLAGGAALCSALYVGDACGAVFLVGQMFTYRPRQHTTVQDCNCSI YSGHI 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I 

62 AVTAP LRRAVD YLAGGAALCSALYVGDACGAVFLVGQMFTYRPRQHTTVQDCNCS I YSGHI 

I Hill III III I III II I II I III I I III I I I I I I I I I 1 I I I ] I I I I I I I I I I I I I I 

6 2 AVTAP LRRAVD YLAGGAALCSALYVGDvCGAl FLVGQMFTYRPRQHaTVQD CNCS I YSGHI 

lllllllllllllllllllllillill III lllllllll II I I I I I I I I I I I I I I 

6 2 AVTAP LRRAVD YLAGGAALCSALYVGDaCGAvFLVGQMFTY s PRrHnvVQDCNCS I YSGHI 
AVTAP LRRaVD YLAGGAALC S AL YVGDa CGAvFLVGQMF t Y r PRqH 1 1 VQD CNCS I YSGH I 
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123 TGHRMAWDMMMNWS PTTALVMAQvLRI PQWIDI I AGGHWGVLFAvAYFASAANWAKWLV 

I M I I II I I I I I I I I I I I I II I I lllllllllllllllllllll I I I I I I j I I I | | | | | 

123 TGHRMAWDMMMNWS PTTALVMAQLLRIPQWID I IAGGHWGVLFAAAYFASAANWAKWLV 

llillllllllllllllll I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I If 

123 TGHRMAWDMMMNWSPTTALLMAQLLRI PQWIDI IAGGHWGVLFAAAYFASAANWAKViLV 

mmiimiimmim mmmmmmmmmmmi n 

123 TG HRMAWDMMMNW S PTTALLMAQMLR I PQWIDI IAGGHWGVLFAAAYFASAANWAKWLV 

I I I I I I I I I I I I I I I III lllllllllllll[|llllllllllllllllllll|||||l 

123 TGHRMAWDMMMNWS PaTALVMAQMLRI PQ WID I IAGGHWGVLFAAAYFASAANWAKWLV 

I I I I I II I II I I I I I lllllll lllllllllllll I I I I I I I II I I I I I I I I | I I | | 

123 TGHRMAWDMMMNW S P t TALVMAQ 1 LRI PQWIDI IAGaHWGVLFAAAYyASAANWAKWLV 


96-101 


consensus 


TGHRMAWDMMMNWS PtTALvMAQl LRI PQWIDI IAGgHWGVLFAaAYf ASAANWAKVvLV 
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SEO ID NO: 

Genotvoe 

81-84 

(IV/2b) 

85 

(2c) 

77-80 

(III/2a) 

86-90 

(V/3a) 

60-76 

(Il/lb) 

52-59 

(I/la) 

91 

(4a) 

93-94 

(4c) 

95 

(4d) 

92 

(4b) 

96-101 

(5a) 

102 

(6a) 


1 VEVRNiSs SYYATNDCSNnS ITWQLTnAVLHLPGCVPCENDNGTLrCWIQVTPNVAVKHRC 
1 VEVKDTGDSYMPTNDCSNSSIVWQLEGAVLHTPGCVPCERTANVSRCWVPVAPNLAISQPC 
1 aqVkNTstsYMVTNDCSNdSITWQLqAAVLHVPGCvPCEkvGNtSRCWIPVsPNVAVqqPC 
1 LEWRNt SGLYvLTNDCsNS S IVYEADDVILHTPGCVPCVQDGNTS tCWTpVTPTVAVRYVC 
1 yEVrNVSG vYhVTNDCSNs S i VyEaaDmlmHTPGCvPCV rEnNsS rCWVALtPTLAARNas 
1 yQVRNStGLYHVTNDCPNSSIVYEaADalLHsPGCVPCVREgnasrCWVavtPTVATRDG* 
1 EHYRNASG I YHITNDCPNS S I VYEADHH ILHLPGCVPCVWTGNTS RCWTPVTPTVAVAHPC 
1 VNYrNASGVYHVTNDCPNSSIvYEAEHqlLHLPGClPCVRvGNQSRCWVALTPTVAvsYIC 
1 YNYRNSSGVYHVTNDCPNSSIVYETDYHILHLPGCVPCVREGNKSTCWSLTPTVAAQHLt 
1 VHYRNASGVYHVTNDCPNTS I VYETEHH IMHLPGCVPCVRTENTSRCWVPLTPTVAAP YPt 
1 VP YRNASGVYHVTNDCPNSS IVYEADnLILHAPGCVPCV rqdNVS rCWVqlTPTLSAPnlC 
1 LTYGNSSGLYHLTNDCPNSSIVLEADAMILHLPGCLPCVRVDDRSTCWHAVTPTLAIPNAS 

Y TNDC NS H PGC PC CW P 


6 2 ALTHNLRtHvDmIVMAATVCSALYVGD vCGAVMI vSQAf I i SPE rHnFTQECNCS IYQGhI 
6 2 ALTKGLRAHIDI I VMSATVCS ALYVGDVCGALMLAAQWWS PQHHTFVQECNCS IYPGRI 
6 2 ALTQGLRTH I DMWMSATLCS ALYVGD 1 CGG vMlAAQMF I vS PqhHwFVQe CNCS IYPGTI 
6 2 ATTAS I RSHVDLLVGAATmCS ALYVGDmCGAVFLVGQAFTFRPRRHQTVQTCNCSLYPGHl 
S 2 vpTt 1 1 RrHVDLLVGAAaFCSaMYVGDLCGSVf LvSQLFTf S PRrheTvQdCNCS i YPGhv 
62 LPatQLRRhIDLLVGSATLCSALYVGDLCGSVFLVgQLFTf SPRrhWTTQdCNCSIYPGHI 
62 APLESFRRHVDLMVGAATLCSALYVGDLCGGAFLMGQMITFRPRRHWTTQECNCSIYTGHI 
62 APLdSIRRHVDLMVGAATVCSALYvGDLCGGaFLVGQMFSFQPRRHWTTQDCNCSIYAGHi 
62 APLESLRRHVDLMVGGATLCSALYIGDVCGGVFLVGQLFTFQPRRHWTTQDCNCSIYTGHI 
62 APLESMRRHVDLMVGAATMCSAFYIGDLCGGVFLVGQLFDFRPRRHWTTQDCNCSIYPGHV 
62 AVTAPLRRaVDYLAGGAALCSALYVGDaCGAvFLVGQMFtYrPRqHttVQDCNCSIYSGHI 
6 2 TPATGFRRHVDLLAGAAWCSSLYIGDLCGSLFLAGQLFTFQPRRHWTVQDCNCS IYTGHV 

R D A CS Y GD CG Q P Q CNCS Y G 


123 TGHRMAWDMMLNWSPTLTMILAYAARVPELvLeWFGGHWGWFGLAYFSMQGAWAKVIAI 
123 TGHRMAWDMMMNWSPTTTMLLAYLVRIPEVTLDIVTGGHWGVMFGLAYFSMQGSWAKVIVI 
123 TGHRMAWDMMMNWSPTaTmILAYaMRVPEVI iDIi sGAHWGVmFGLAYFSMQGAWAKVvVT 
123 S GHRMAWDMMMNWS PAVGMWAH vLRLPQTl FD I i AGAHWG 1 1AGLAYYSMQGNWAKVAI i 
123 BGHRMAWDMMMNWSPTaALVvSQLLRiPQAvvDmVaGAHWGvLAGLAYYSMvGNWAKVI.lv 
123 TGHRMAWDMMMNWS PTtALVvAQLLRi PQAiLDMlAGAHWGVLAGIAYFSMvGNWAKVl vV 
123 TGHRMAWDMMMNWSP TTT LLLAQIMRVPTAFLDMVAGGHWGVLAGLAYFSMQGNWAKWLV 
123 TGHRMAWDMMMNWS PTTTL1 LAQVMRI PSTLVDLLaGGHWGvLvGlAYFsMQANWAKVILV 
123 TGHRMAWDMMMNWS PTATLVLAQLMRI PGAMVDLLAGGHWGILVGIAYFSMQANWAKVILV 
123 SGHRMAWDMMMNWSPTSALIMAQILR1PSILGDLLTGGHWGVLAGLAFFSMQSNWAKVXLV 
123 TGHRMAWDMMMNW S P tTAL vMAQ 1 LRI PQWI D I IAGgHWGVLFAaAYf ASAANWAKVvLV 
123 TGHRMAWDMMMNWS PTTTLVLS S I LRVPE I CAS VI FGGHWG I LLAVAYFGMAGNWLKVLAV 
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SEO 

ID NO: 

ISOLATE 


108 

103 

104 

105 

106 
107 


DR4 

DK7 

US11 

S14 

SW1 

S18 


1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGTCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAAC CGTCGCCCACAGG 
1 ATGAG CACGAATC CTAAAC CT CAAAGAAAAAC CAAACGTAACAC CAAC CGTCG C C CACAGG 
1 ATGAG CACGAATC CTAAAC CT CAAAGAAAAAC CAAACGTAACAC CAAC CGTCG C C CACAGG 
1 ATGAG CACGAATC CTAAAC CT CAAAGAAAAAC CAAACGTAACAC CAAC CGTCG C C CACAGG 
1 ATGAG CACaAATCCTAAACCTCAAAGAAAAACCAAACGTAAGACCAACCGTCGCCCACAGG 

ATGAG CACgAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGTCGCCCACAGG 


6 2 ACGTCAAGTTCCCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGG 
6 2 ACGTtAAGTTCCCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGG 

ACGTcAAGTTCCCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGG 


123 CCCTAGATTGGGTGTGCGCGCGaCGAGGAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGA 
123 CCCTAGATTGGGTGTGCGCGCGcCGAGGAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGA 
123 CCCTAGATTGGGTGTGCGCGCGACGAGGAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGA 
123 CCCTAGATTGGGTGTGCGCGCGACGAGGAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGA 
123 C C CTAGATTGGGTGTG C G CG CGACGAGGAAGACTTC CGAG C GGTCG CAAC CT CGAGGTAGA 
123 CCCTAGATTGGGTGTGCGCGCGACGAGGAAGACTTCCGAGCGGTCGCAACCTCGcGGTAGA 

CCCTAGATTGGGTGTGCGCGCGaCGAGGAAGACTTCCGAGCGGTCGCAACCTCGaGGTAGA 


1 84 CGTCAGCCTATCCCCAAGGCgCGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACC 
184 CGTCAGCCTATCCCCAAGGCACGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACC 
184 CGTCAGCCTATCCCCAAGGCACGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACC 
184 CGTCAGCCTATCCCCAAGGCACGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTATC 
1 84 CGTCAGCCTATCCCCAAGGCGCGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTATC 
1 84 CGTCAGCCTATCCCCAAGGCGCGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTAcC 

CGTCAGCCTATCCCCAAGGC - CGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTAcC 


245 CTTGGCCCCTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCcCCCCGTGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCTCCCCGTGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCTCCCCGTGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCTCCCCGTGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCTGCGGaTGGGCGGGATGGCTCCTGTCCCCCCGTGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCTGCGGgTGGGCGGGATGGCTCCTGTCCCCCCGTGG 

CTTGGCCCCTCTATGGCAATGAGGGCTGCGGgTGGGCGGGATGGCTCCTGTC - CCCCGTGG 


306 CTCTCGGCCTAGCTGGGGCCCCACAGACCCCCGGCGtAGGTCGCGCAATTTGGGTAAgGTC 
306 CTCTCGGCCTAGCTGGGGCCCCACAGACCCCCGGCGcAGGTCGCGCAATTTGGGTAAaGTC 
306 CTCTCGGCCTAGCTGGGGCCCCACgGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTC 
306 CTCTCGGCCTAGCTGGGGCCCCACAGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTC 
306 CTCTCGGCCTAGCTGGGGCCCTACAGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTC 
306 CTCcCGGCCTAGCTGGGGCCCTACAGACCCCCGGCGTAGGTCGCGCAATTTGGGcAAaGTC 


consensus 


CTCtCGGCCTAGCTGGGGCCCcACaGACCCCCGGCGtAGGTCGCGCAATTTGGGtAAgGTC 


103-108 
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367 ATCGAcACCCTcACGTGCGGCTTCGCCGACCTCATGGGGTACATcCCGCTCGTCGGCGCCC 
367 ATCGATACCCTTACGTGCGGCTTCGCCGACCTCATGGGGTACATACCGCTCGTCGGCGCCC 
367 ATCGATACCCTTACGTGCGGCTTCGCCGACCTCATGGGGTACATACCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACGTGCGGCTTCGCCGACCTCATGGGGTACATACCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACGTGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACGTGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 

ATCGAtACCCTcACGTGCGGCTTCGCCGACCTCATGGGGTACATaCCGCTCGTCGGCGCCC 


428 CcCTTGGgGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGaGTTCTGGAAGACGGCGTGAA 
428 CTCTTGGAGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAAGACGGCGTGAA 
428 CTCTCGGAGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAAGACGGCGTGAA 
428 CcCTCGGgGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAAGACGGCGTGAA 
428 CTCTtGGAGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAAGACGGCGTGAA 
428 CTCTcGGAGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAAGACGGCGTGAA 

CtCT-GGaGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGgGTTCTGGAAGACGGCGTGAA 


489 

489 

489 

489 

489 

489 


CTATGCAACAGGGAAtCTTCCTGGTTGCTCTTTCTCTATCTTCCTTTTGGCttTGCTCTCT 

CTATGCAACAGGGAACCTrCCTGGTTGCTCTTTCTCTATCTrCCTnTGGCCCTGCTCTCT 

CTATGCAACAGGGAACCTTCCTGGTTGCTCTTTCTCTATCTTCCTTCTGGCCCTGCTCTCT 

CTATGCAACAGGGAACCTTCCTGGTTGCTCTTTCTCTATCTTCCTTCTGGCCCTGCTTTCT 


CTATGCAACAGGGAAcCTTCCTGGTTGCTCTTTCTCTATCTTCCTtcTgGCccTGCTcTCT 


550 TGCtTGACCGTGCCCGCaTCGGCC 
550 TGCCTGACCGTGCC CGCTTCGGCC 
550 TGCCTGACTGTGCCCGCTTCAGCC 
550 TGCCTGACTGTGCCCGCTTCAGCC 
550 TGCCTGACaGTGCCCGCGTCAGCC 
550 TGtCTGACtGTGCCCGCGTCAGCt 

TGccTGACtGTGCCCGCtTCaGCc 
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1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAGCACGAATC CTAAAC CTCAAAGAAAAACCAAACGTAACAC CAACCGCCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 

1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCGACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAgACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAGCACGAATC CTAAAC CTCAAAGAAAAACCAAACGTAACAC CAACCGCCGCCCACAGG 
1 ATGAGCACGAATC CTAAACCTCAAAGAcAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAGCACGAcTCCTAAACCTCAAAGAAAAACCAAACGTAACACCAgCCGCCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAGCACGAATC CTAAAC CTCAAAGAAAAACCAAACGTAACAC CAACCGCCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 

ATGAGCACGAaTC CTAAAC CTCAAAGAaAaAC CAAACGTAACACCAa C CG C CG C C CACAGG 


6 2 ACGTtAAGTTCCCGGGCGGTGGt CAGATCGTcGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGCCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGCCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGCCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTtAAGTTCCCGGGCGGTGGCCAGATCGTcGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTcTAtCTGTTGCCGCGCAGGGG 
62 ACGTCAAGTTCCCGGGtGGcGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTTAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTTAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 

ACGTcAAGTTCCCGGGcGGtGGtCAGATCGTtGGTGGAGTtTAcCTGTTGCCGCGCAGGGG 


123 CCCCAGGTTGGGTGTGCGCGCaACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCcGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACgAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCaCAACCTCGTGGAcGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGaTCGCAACCTCGTGGcAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACCAGGAAGACTTCaGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACCAGGAAGACTTCcGAGCGGTCGCAACCTCGTGGAAGG 

CCCCaGGTTGGGTGTGCGCGCgACtAGGAAGACTTCcGAGCGgTCgCAACCTCGTGGaaGG 


184 CGACAACCTATCCCCAAGGCTCGCCatCCCGAGGGcAGGGCCTGGGCTCAGCCCGGGTACC 
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184 CGACAACCTATCCCCAAGGCTCGCCGGCCCGAGGGTAGGGCCTGGGCTCAGCCCGGGTACC 
184 CGACAACCTATCCCCAAGGCTCGCCGGCCCGAGGGTAGGGCCTGGGCTCAGCCCGGGcACC 
184 CGACAACCTATCCCCAAGGCTCGCCGGCCCGAGGGTAGGGCCTGGGCTCAGCCCGGGTACC 
184 CGACAACCTATCCCCAAGGCTCGCCGGCCCGAGGGCAGGGCCTGGGCTCAGCCCGGGTACC 
184 CGACAACCTATCCCCAAGGCTCGCCGGCCCGAGGGCAGGGCCTGGGCTCAGCCCGGGTACC 
184 CGACAACCTATCCCCAAGGCTCGCCGGCCCGAGGGCAGGGCCTGGGCTCAGCCCGGGTACC 
184 CGACAgCCTATCCCCAAGGCTCGCCAGCCCGAGGGCAGGGCCTGGGCTCAGCCCGGGTACC 
1 84 CGACAACCTATCCCCAAGGCTCGCCAGCCCGAGGGCAGGGCCTGGGCTCAGCCtGGGTACC 
184 CGACAACCTATCCCCAAGGCTCGCCAaCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACC 
184 CGACAACCTATCCCCAAGGCTCGCCAGCCCGAGGGCAGGACCTGGGCCCAGCCCGGGTACC 
1 84 CGACAACCTATCCCCAAGGCTCGCCGGCCCGAGGGCAGGGCCTGGGCCCAGCCCGGGCAtC 
1 84 CGACAACCTATCCCCAAGGCTCGCCGGCCCGAGGGTAGGGCCTGGGCTCAGCCCGGGCACC 
184 CGACAACCTATCCCCAAGGCTCGCCGGCCCGAGGGTAGGGCCTGGGCTCAGCCCGGGTACC 
1 84 CGACAACCTATCCCCAAGGCTCGCCaACCCGAGGGCAGGACCTGGGCTCAGCCCGGGTATC 
1 84 CGACAACCTATCCCCAAGGCTCGCCgACCCGAGGGCAGGACCTGGGCTCAGCCCGGGTATC 

CGACAaCCTATCCCCAAGGCTCGCCggCCCGAGGGcAGGgCCTGGGCtCAGCCcGGGtAcC 


245 CTTGGCCCCTCTAcGGCAATGAGGGCTTGGGGTGGGCAGGATGGCTCCTGTCACCCCGtGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCTTGGGGTGGGCAGGATGGCTCCTGTCACCCCGCGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCTTGGGGTGGGCAGGATGGCTCCTGTCACCCCGCGG 
245 CTTGGCCCCTCTATGGCAACGAGGGCTTGGGGTGGGCAGGATGGCTCCTGTCACCCCGCGG 
245 CTTGGCCCCTCTATGGCAACGAGGGCaTGGGGTGGGCAGGATGGCTCCTGTCACCCCGTGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCtTGGGGTGGGCAGGATGGCTCCTGTCACCCCGTGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCATGGGGTGGGCAGGATGGCTCCTGTCACCCCGcGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCATGGGGTGGGCAGGATGGCTCCTGTCACCCCGtGG 
245 CcTGGCCCCTCTATGGCAATGAGGGCATGGGaTGGGCAGGATGGCTCCTGTCcCCCCGCGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCATGGGGTGGGCAGGATGGCTCCTGTCACCCCGCGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCTTGGGGTGGGCAGGATGGCTCCTGTCACCCCGTGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCTTGGGGTGGGCAGGATGGCTCCTGTCACCCCGTGG 
245 CTTGGCCCCTCTATGcCAATGAGGGCTTGGGGTGGGCgGGATGGCTCCTGTCACCCCGCGG 
245 CTTGGCCCCTCTATGGCgACGAGGGCATGGGGTGGGCAGGATGGCTCCTGTCACCCCGCGG 
245 CTTGGCCCCTCTATGGCAACGAGGGCATGGGGTGGGCAGGATGGCTCCTGTCACCCCGCGG 
245 CTTGGCCCCTCTATGGCAAtGAGGGCATGGGGTGGGCAGGATGGCTCCTGTCACCCCatGG 

CtTGGCCCCTCTAtGgCaAtGAGGGC - TGGGgTGGGCaGGATGGCTCCTGTCaCCCCgcGG 


306 cTCTCGGCCTAGTTGGGGCCCCAatGACCCCCGGCGTAGGTCGCGTAATTTGGGTAAgGTC 
306 tTCTCGGCCTAGTTGGGGCCCCACAGACCCCCGGCGTAGGTCGCGTAATTTGGGTAAaGTC 
306 CTCTCGGCCTAGTTGGGGCCCCACAGACCCCCGGCGTAGGTCGCGTAATTTGGGTAAGGTC 
306 CTCCCGGCCTAGTTGGGGCCCCACcGACCCCCGGCGTAGGTCGCGTAATTTGGGTAAGGTC 
306 CTCCCGGCCTAGTTGGGGCCCCACGGACCCCCGGCGTAGGTCGCGTAATTTGGGTAAGGTC 
306 CTCTCGGCCTAGTTGGGGCCCCACGGACCCCCGGCGTAGGTCGCGTAATTTGGGTAAGGTC 
306 CTCTCGGCCTAGTTGGGGCCCCAacGACCCCCGGCGTAGGTCGCGTAATTTGGGTAAGGTC 
306 CTCcCGGCCTAGTTGGGGCCCCACaGACCCCCGGCGTAGGTCGCGTAATTTGGGTAAGGTC 
306 CTCTCGGCCTAGTTGGGGCCCCACtGACCCCCGGCGTAGGTCGCGTAATTTGGGTAAGGTC 
306 CTCTCGGCCTAGTTGGGGCCCCACGGACCCCCGGCGTAGGTCGCGcAATTTGGGTAAGGTC 
306 CTCTCGGCCTAGTTGGGGCCCCACGGACCCCCGGCGTAGGTCGCGtAATTTGGGTAAGGTC 
306 CTCCCGGCCTAGTTGGGGCCCCACGGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTC 
306 CTCCCGGCCTAGTTGGGGCCCCACGGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTC 
306 CTCCCGGCCTAATTGGGGCCCCACaGACCCCCGGCGTAGGTCGCGtAATcTGGGTAAGGTC 
306 CTCTCGGCCTAATTGGGGCCCCACGGACCCCCGGCGTAGGTCGCGcAATTTGGGTAAGGTC 
306 CTCTCGGCCTAgTTGGGGCCCCACGGACCCCCGGCGTAGGTCGCG tAATTTGGGTAAGGTC 

cTCtCGGCCTAgTTGGGGCCCCAcgGACCCCCGGCGTAGGTCGCGtAATtTGGGTAAgGTC 


367 ATCGATACCCTCACATGCGGCTTtGCCGACCTCATGGGGTACATtCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATCCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATCCCGCTCGTCGGCGCCC 
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3 67 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATCCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACgTGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGgCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCtC 
367 ATCGATACCCTCACGTGCGGCTTCGCCGACCTCATGGGGTACATCCCGCTCGTCGGtGCCC 
367 ATCGATACCCTCACGTGCGGCTTCGCCGACCTCATGGGGTACATCCCGCTCGTCGGcGCCC 

ATCGATACCCTCACaTGCGGCTTcGCCGACCTCATGGGGTACATtCCGCTCGTCGGcGccC 


428 CCCTAGGGGGCGCTGCCAGGGCtCTGGCGCATGGCGTCCGGGTtCTGGAGGACGGCGTGAA 
428 CCCTAGGGGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTCCTGGAGGACGGCGTGAA 
428 CCCTAGGGGGTGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTCCTGGAGGACGGCGTGAA 
428 CCCTAGGGGGTGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAGGACGGCGTGAA 
428 CCCTAGGGGGCGCTGCCAGGGCCtTGGCGCATGGCGTCCGGGTTCTGGAGGACGGCGTGAA 
428 CCCTAGGGGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAGGACGGCGTGAA 
428 CCCTAGGGGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAGGACGGCGTGAA 
428 CCCTAGGGGGCGCTGCCAGGGCtCTGGCaCATGGtGTCCGGGTTCTGGAGGACGGCGTGAA 
428 CCCTAGGGGGCGCTGCCAGGGCCCTGGCgCATGGcGTCCGGGTcCTGGAGGACGGCGTGAA 
428 CCTTAGGGGGCGtTGCCAGaGCCCTGGCaCATGGtGTCCGGGTTgTGGAGGACGGCGTGAA 
428 CtTTAGGGGGCGCTGCCAGgGCCTTGGCGCATGGCGTCCGGGTTCTGGAaGACGGCGTGAA 
428 CCCTAGGGGGCGCTGCCAGaGCCTTGGCGCATGGCGTCCGGGTTCTGGAGGACGGCGTGAA 
428 CCCTAGGGGGCGTTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTgTGGAGGACGGCGTGAA 
428 CCtTAGGGGGCGTTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAGGACGGCGTGAA 
428 CCCTAGGGGGCGTTGCCAGAGCCtTGGCACATGGTGTCCGGGTTCTGGAGGACGGCGTGAA 
428 CCCTAGGGGGCGTTGCCAGAGCCcTGGCACAcGGTGTCCGGGTTCTGGAGGACGGCGTGAA 

CccTAGGGGGcGcTGCCAGgGCccTGGCgCAtGGcGTCCGGGTtcTGGAgGACGGCGTGAA 


489 CTATGCAACAGGGAACcTcCCCGGTTGCTCTTTCTCTATCTTCCTTcTgGCTTTGCTgTCC 
489 CTATGCAACAGGGAACTTGCCCGGTTGCTCTrTCTCTATCTTCCTTTTaGCT TI GCTATCC 
489 
489 
489 

489 CTATGCAACAGGGAATcTGCCCGGTTGCTCTTTCTCTATCTTCCTCTTGGCTTTGCTGTCC 
489 CTAcGCAACAGGGAATTTGCCCGGTTGCTCTTTCTCTATCTTCCTCTTGGCTCTGtTGTCC 
489 CTATGCAACAGGGAATTTGCCCGGTTGCTCTTTTTCTATCTTCCTCTTGGCTCTGCTGTCt 
489 CTATGCAACAGGGAATcTGCCCGGTTGCTCcTTTTCTATCTTCCTCTTGGCTtTGCTGTCC 
489 CTATGCAACAGGGAATTTGCCCGGTTGCTCTTTCTCTATCTTCCTCTTGGCTcTGCTGTCC 
489 CTATGCAACAGGGAATTTGCCCGGTTGCcCTTTCTCTATCTTCCTCTTGGCTtTGCTGTCC 
489 CTATGCAACAGGGAATCTGCCCGGTTGCTCTTTCTCTATCTTCCTCTTGGCTcTGCTGTCC 
489 CTATGCAACAGGGAATCTGCCTGGTTGCT CTTT CTCTATCTTCCTtTTGGCTTTGCTGTCt 

489 CTAtGCAACAGGGAATTTACCCGGTTGCTCTTTCTCTATCTTCCTCTTGGCTTTGCTGTCC 
489 CTAcGCAACAGGGAATaTACCCGGTTGCTCTTTCTCTATCTTCCTtTTGGCTTTGCTGTCC 

cTAtGCAACAGGGAAttTgCCcGGTTGCtCtTTcTCTATCTTCCTctTgGCTtTGcTgTCc 


550 TGTTTGACCATCCCAGCTTCCGCT 
550 TGTTTGACCATCCCAGCTTCCGCT 
550 TGTTTGACCgTCCCAGCTTCCGCT 
550 TGTTTGACCATCCCAGCTTCCGCT 
S50 TGTTTGACCATtCCAGCTTCCGCT 
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109-124 consensus 


TGttTgACCatcCCAGctTCCGCt 
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1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAG CAC GAATC CTAAAC CTCAAAGAAAAAC CAAACGT AACAC CAAC C GC C GC C CACAGG 
1 ATGAG CACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAG CACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAG CACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAG CACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAG CACGAATCCTAAACCTCAAAGAAAgACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAG CACGAATCCTAAACCTCAAAGAAAAACCAAACGTAAGACCAACCGCCGCCCACAGG 
1 ATGAG CACGAATCCTAAACCTCAAAGAcAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAGCACGAcTCCTAAACCTCAAAGAAAAACCAAACGTAACACCAgCCGCCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGTCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGTCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGTCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGTCGCCCACAGG 
1 ATGAGCACaAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGTCGCCCACAGG 
I ATGAG CAC g AATC CT AAAC CT CAAAGAAAAAC CAAAC GTAACAC CAAC CGTCG C C CACAGG 

ATGAG CAC g AaT C CTAAAC CT CAAAGAaAaAC CAAAC GTAACAC CAa CCGcCGCC CACAGG 


6 2 ACGTCAAGTTCCCGGGCGGTGGCCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGCCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 

6 2 ACGTCAAGTTCCCGGGCGGTGGCCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTtAAGTTCCCGGGCGGTGGCCAGATCGTcGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
62 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTcTAtCTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGtGGcGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTTAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTTAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
62 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACCTGTTGCCGCGCAGGGG 
62 ACGTCAAGTTCCCGGGTGGCGGTCAGATCGTTGGTGGA G TTT A CTTGTTGCCGCGCAGGGG 
62 ACGTCAAGTTCCCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGG 
62 ACGTCAAGTTC CCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGG 
62 ACGTCAAGTTCCCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGG 
62 ACGTtAAGTTCCCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGG 
62 ACGTcAAGTTC CCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGC CGCGCAGGGG 

ACGTcAAGTTCCCGGGcGGtGGtCAGATCGTtGGTGGAGTtTAccTGTTGCCGCGCAGGGG 


123 CCCCAGGTTGGGTGTGCGCGCaACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCcGGTTGrGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 
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SA10 123 CCCCAGGTTGGGTGTGCGCGCGACgAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 

S45 123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCaCAACCTCGTGGAcGG 

P8 123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGaTCGCAACCTCGTGGcAGG 

T3 123 CCCCAGGTTGGGTGTGCGCGCGACTAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 

HK3 123 CCCCAGGTTGGGTGTGCGCGCGACCAGGAAGACTTCaGAGCGGTCGCAACCTCGTGGAAGG 

HK5 123 CCCCAGGTTGGGTGTGCGCGCGACCAGGAAGACTTCCGAGCGGTCGCAACCTCGTGGAAGG 

DR4 123 CCCTAGATTGGGTGTGCGCGCGACGAGGAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGA 

US 11 123 CCCTAGATTGGGTGTGCGCGCGACGAGGAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGA 

SI 4 123 CCCTAGATTGGGTGTGCGCGCGACGAGGAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGA 

SW1 123 CCCTAGATTGGGTGTGCGCGCGACGAGGAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGA 

SI 8 123 CCCTAGATTGGGTGTGCGCGCGACGAGGAAGACTTCCGAGCGGTCGCAACCTCGcGGTAGA 

DK7 123 CCCTAGATTGGGTGTGCGCGCGcCGAGGAAGACTTCCGAGCGGTCGCAACCTCGaGGTAGA 


103-124 consensus 


CCCcaGgTTGGGTGTGCGCGCgaCtAGGAAGACTTCcGAGCGgTCgCAACCTCGtGGaaGg 


SEP ID NO: 

119 

117 

118 
111 
112 

113 

114 

115 

116 
122 

109 

110 

123 

124 

120 
121 
108 

104 

105 

106 
107 
103 


ISOLATE 
S9 
IND3 
IND8 
D1 
US 6 
P10 
DK1 
T10 
SW2 
HK4 
SA10 
S45 
P8 
T3 
HK3 
HK5 
DR4 
US11 
S14 
SW1 
S18 
DK7 


184 CGACAACCTATCCCCAAGGCTCGCCatCCCGAGGGcAGGGCCTGGGCTCAGCCCGGGTACC 
184 CGACAACCTATCCCCAAGGCTCGCCGGCCCGAGGGTAGGGCCTGGGCTCAGCCCGGGTACC 
184 CGACAACCTATCCCCAAGGCTCGCCGGCCCGAGGGTAGGGCCTGGGCTCAGCCCGGGcACC 
184 CGACAACCTATCCCCAAGGCTCGCCGGCCCGAGGGTAGGGCCTGGGCTCAGCCCGGGTACC 
184 CGACAACCTATCCCCAAGGCTCGCCGGCCCGAGGGCAGGGCCTGGGCTCAGCCCGGGTACC 
184 CGACAACCTATCCCCAAGGCTCGCCGGCCCGAGGGCAGGGCCTGGGCTCAGCCCGGGTACC 
184 CGACAACCTATCCCCAAGGCTCGCCGGCCCGAGGGCAGGGCCTGGGCTCAGCCCGGGTACC 
184 CGACAgCCTATCCCCAAGGCTCGCCAGCCCGAGGGCAGGGCCTGGGCTCAGCCCGGGTACC 
184 CGACAACCTATCCCCAAGGCTCGCCAGCCCGAGGGCAGGGCCTGGGCTCAGCCt GGGTACC 
184 CGACAACCTATCCCCAAGGCTCGCCAaCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACC 
184 CGACAACCTATCCCCAAGGCTCGCCAGCCCGAGGGCAGGACCTGGGCCCAGCCCGGGTACC 
184 CGACAACCTATCCCCAAGGCTCGCCGGCCCGAGGGCAGGGCCTGGGCCCAGCCCGGGCAtC 
184 CGACAACCTATCCCCAAGGCTCGCCGGCCCGAGGGTAGGGCCTGGGCTCAGCCCGGGCACC 
184 CGACAACCTATCCCCAAGGCTCGCCGGCCCGAGGGTAGGGCCTGGGCTCAGCCCGGGTACC 
184 CGACAACCTATCCCCAAGGCTCGCCaACCCGAGGGCAGGACCTGGGCTCAGCCCGGGTATC 
184 CGACAACCTATCCCCAAGGCTCGCCGACCCGAGGGCAGGACCTGGGCTCAGCCCGGGTATC 
184 CGTCAGCCTATCCCCAAGGCgCGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACC 
184 CGTCAGCCTATCCCCAAGGCACGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACC 
184 CGTCAGCCTATCCCCAAGGCACGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTATC 
184 CGTCAGCCTATCCCCAAGGCGCGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTATC 
184 CGTCAGCCTATCCCCAAGGCGCGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACC 
184 CGTCAGCCTATCCCCAAGGCaCGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACC 


103-124 consensus 


CGaCAaCCTATCCCCAAGGCtCGcCggCCCGAGGGcAGGgCCTGGGCtCAGCCcGGGtAcC 


SEP ID NO: 

119 

117 

118 
111 
112 

113 

114 

115 

116 
122 

109 

110 

123 

124 

120 
121 
108 

104 

105 

106 
107 


ISOLATE 
S9 
IND3 
IND8 
D1 
US 6 
P10 
DK1 
T10 
SW2 
HK4 
SA10 
S45 
P8 
T3 
HK3 
HK5 
DR4 
US11 
S14 
SW1 
S18 


245 CTTGGCCCCTCTAcGGCAATGAGGGCTTGGGGTGGGCAGGATGGCTCCTGTCACCCCGtGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCTTGGGGTGGGCAGGATGGCTCCTGTCACCCCGCGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCTTGGGGTGGGCAGGATGGCTCCTGTCACCCCGCGG 
245 CTTGGCCCCTCTATGGCAACGAGGGCTTGGGGTGGGCAGGATGGCTCCTGTCACCCCGCGG 
245 CTTGGCCCCTCTATGGCAACGAGGGCaTGGGGTGGGCAGGATGGCTCCTGTCACCCCGTGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCtTGGGGTGGGCAGGATGGCTCCTGTCACCCCGTGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCATGGGGTGGGCAGGATGGCTCCTGTCACCCCGcGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCATGGGGTGGGCAGGATGGCTCCTGTCACCCCGtGG 
245 C cTGGC CCCTCTATGGCAATGAGGGCATGGGaTGGGCAGGATGGCTCCTGTC cCCCCGCGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCATGGGGTGGGCAGGATGGCTCCTGTCACCCCGCGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCTTGGGGTGGGCAGGATGGCTCCTGTCACCCCGTGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCTTGGGGTGGGCAGGATGGCTCCTGTCACCCCGTGG 
245 CTTGGCCCCTCTATGcCAATGAGGGCTTGGGGTGGGCgGGATGGCTCCTGTCACCCCGCGG 
245 CTTGGCCCCTCTATGGCgACGAGGGCATGGGGTGGGCAGGATGGCTCCTGTCACCCCGCGG 
245 CTTGGCCCCTCTATGGCAACGAGGGCATGGGGTGGGCAGGATGGCTCCTGTCACCCCGCGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCATGGGGTGGGCAGGATGGCTCCTGTCACCCCaTGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCcCCCCGTGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCTCCCCGTGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCTCCCCGTGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCTGCGGaTGGGCGGGATGGCTCCTGTCCCCCCGTGG 
245 CTTGGCCCCTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCCCCCCGTGG 
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103 DK7 245 CTTGGCCCCTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCtCCCCGTGG 

103-124 consensus CtTGGCCCCTCTAtGgCaAtGAGGGCttgGGgTGGGCaGGATGGCTCCTGTCaCCCCgtGG 


SEP ID NO: 

119 

117 

118 
111 
112 

113 

114 

115 

116 
122 

109 

110 

123 

124 

120 
121 
108 

104 

105 

106 
107 
103 


D1 
US 6 
P10 
DK1 
T10 
SW2 
HK4 
SA10 
S4 5 
P8 
T3 
HK3 
HK5 
DR4 
US11 
S14 
SW1 
S18 
DK7 


306 cTCTCGGCCTAGTTGGGGCCCCAatGACCCCCGGCGTAGGTCGCGTAATTTGGGTAAgGTC 
306 tTCTCGGCCTAGTTGGGGCCCCACAGACCCCCGGCGTAGGTCGCGTAATTTGGGTAAaGTC 
306 CTCTCGGCCTAGTTGGGGCCCCACAGACCCCCGGCGTAGGTCGCGTAATTTGGGTAAGGTC 
306 CTCCCGGCCTAGTTGGGGCCCCACcGACCCCCGGCGTAGGTCGCGTAATTTGGGTAAGGTC 
306 CTCCCGGCCTAGTTGGGGCCCCACGGACCCCCGGCGTAGGTCGCGTAATTTGGGTAAGGTC 
306 CTCTCGGCCTAGTTGGGGCCCCACGGACCCCCGGCGTAGGTCGCGTAATTTGGGTAAGGTC 
306 CTCTCGGCCTAGTTGGGGCCCCAacGACCCCCGGCGTAGGTCGCGTAATTTGGGTAAGGTC 
306 CTCcCGGCCTAGTTGGGGCCCCACaGACCCCCGGCGTAGGTCGCGTAATTTGGGTAAGGTC 
306 CTCTCGGCCTAGTTGGGGCCCCACtGACCCCCGGCGTAGGTCGCGTAATTTGGGTAAGGTC 
306 CTCTCGGCCTAGTTGGGGCCCCACGGACCCCCGGCGTAGGTCGCGcAATTTGGGTAAGGTC 
306 CTCTCGGCCTAGTTGGGGCCCCACGGACCCCCGGCGTAGGTCGCGtAATTTGGGTAAGGTC 
306 CTCCCGGCCTAGTTGGGGCCCCACGGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTC 
306 CTCCCGGCCTAGTTGGGGCCCCACGGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTC 
306 CTCCCGGCCTAATTGGGGCCCCACaGACCCCCGGCGTAGGTCGCGtAATcTGGGTAAGGTC 
306 CTCTCGGCCTAATTGGGGCCCCACGGACCCCCGGCGTAGGTCGCGcAATTTGGGTAAGGTC 
306 CTCTCGGCCTAGTTGGGGCCCCACGGACCCCCGGCGTAGGTCGCGtAATTTGGGTAAGGTC 
306 CTCTCGGCCTAGCTGGGGCCCCACaGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTC 
306 CTCTCGGCCTAGCTGGGGCCCCACgGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTC 
306 CTCTCGGCCTAGCTGGGGCCCCACAGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTC 
306 CTCTCGGCCTAGCTGGGGCCCTACAGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTC 
306 CTCcCGGCCTAGCTGGGGCCCTACAGACCCCCGGCGTAGGTCGCGCAATTTGGGcAAAGTC 
306 CTCtCGGCCTAGCTGGGGCCCcACAGACCCCCGGCGcAGGTCGCGCAATTTGGGtAAAGTC 


103-124 consensus 


cTCtCGGCCTAgtTGGGGCCCcAc - GACCCCCGGCGtAGGTCGCGtAATtTGGGtAAgGTC 


SEP ID NO: 

119 

117 

118 
111 
112 

113 

114 

115 

116 
122 

109 

110 

123 

124 

120 
121 
108 

104 

105 

106 
107 
103 


ISOLATE 
S9 
IND3 
IND8 
D1 
US 6 
P10 
DK1 
T10 
SW2 
HK4 
SA10 
S45 
P8 
T3 
HK3 
HK5 
DR4 
US11 
S14 
SW1 
S18 
DK7 


367 ATCGATACCCTCACATGCGGCTTtGCCGACCTCATGGGGTACATtCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATCCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATCCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATCCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACgTGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGgCC 
367 ATCGATACCCTCACATGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCtC 
367 ATCGATACCCTCACGTGCGGCTTCGCCGACCTCATGGGGTACATCCCGCTCGTCGGtGCCC 
367 ATCGATACCCTCACGTGCGGCTTCGCCGACCTCATGGGGTACATCCCGCTCGTCGGCGCCC 
367 ATCGAcACCCTCACGTGCGGCTTCGCCGACCTCATGGGGTACATCCCGCTCGTCGGCGCCC 
367 ATCGATACCCTtACGTGCGGCTTCGCCGACCTCATGGGGTACATACCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACGTGCGGCTTCGCCGACCTCATGGGGTACATACCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACGTGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTCACGTGCGGCTTCGCCGACCTCATGGGGTACATTCCGCTCGTCGGCGCCC 
367 ATCGATACCCTtACGTGCGGCTTCGCCGACCTCATGGGGTACATaCCGCTCGTCGGCGCCC 


103-124 consensus 


ATCGAtACCCTcACaTGCGGCTTcGCCGACCTCATGGGGTACATtCCGCTCGTCGGcGccC 


SEP ID NO: 
119 

117 

118 
111 
112 


ISOLATE 
S9 
IND3 
IND8 
D1 
US 6 


428 CCCTAGGGGGCGCTGCCAGGGCtCTGGCGCATGGCGTCCGGGTtCTGGAGGACGGCGTGAA 
428 CCCTAGGGGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTCCTGGAGGACGGCGTGAA 
428 CCCTAGGGGGTGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTCCTGGAGGACGGCGTGAA 
428 CCCTAGGGGGTGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAGGACGGCGTGAA 
428 CCCTAGGGGGCGCTGCCAGGGCCtTGGCGCATGGCGTCCGGGTTCTGGAGGACGGCGTGAA 
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P10 428 CCCTAGGGGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAGGACGGCGTGAA 

DK1 428 CCCTAGGGGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAGGACGGCGTGAA 

T10 428 CCCTAGGGGGCGCTGCCAGGGCtCTGGCaCATGGtGTCCGGGTTCTGGAGGACGGCGTGAA 

SW2 428 CCCTAGGGGGCGCTGCCAGGGCCCTGGCgCATGGcGTCCGGGTcCTGGAGGACGGCGTGAA 

HK4 428 CCTTAGGGGGCGtTGCCAGaGCCCTGGCaCATGGtGTCCGGGTTgTGGAGGACGGCGTGAA 

SA10 428 CtTTAGGGGGCGCTGCCAGgGCCTTGGCGCATGGCGTCCGGGTTCTGGAaGACGGCGTGAA 

S45 428 CCCTAGGGGGCGCTGCCAGaGCCTTGGCGCATGGCGTCCGGGTTCTGGAGGACGGCGTGAA 

P8 428 CCCTAGGGGGCGTTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTgTGGAGGACGGCGTGAA 

T3 428 CCtTAGGGGGCGTTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAGGACGGCGTGAA 

HK3 428 CCCTAGGGGGCGTTGCCAGAGCCtTGGCACATGGTGTCCGGGTTCTGGAGGACGGCGTGAA 

HK5 428 CCCTAGGGGGCGTTGCCAGAGCCCTGGCACAcGGTGTCCGGGTTCTGGAGGACGGCGTGAA 

DR4 428 CCCTtGGGGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGaGTTCTGGAAGACGGCGTGAA 

US11 428 CtCTCGGaGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAAGACGGCGTGAA 

S14 428 CcCTCGGgGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAAGACGGCGTGAA 

SW1 428 CTCTtGGAGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAAGACGGCGTGAA 

SI 8 428 CTCTcGGAGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAAGACGGCGTGAA 

DK7 428 CTCTtGGAGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAAGACGGCGTGAA 


103-124 consensus 


CccTaGGgGGcGcTGCCAGgGCccTGGCgCAtGGcGTCCGgGTtcTGGAgGACGGCGTGAA 


SEP ID NO: 

119 

117 

118 
111 
112 

113 

114 

115 

116 
122 

109 

110 

123 

124 

120 
121 
108 

104 

105 

106 
107 
103 

103-124 


IND3 
IND8 
D1 
US 6 
P10 
DK1 
T10 
SW2 
HK4 
SA10 
S45 
P8 
T3 
HK3 
HK5 
DR4 
US11 
S14 
SW1 
S18 
DK7 

consensus 


489 CTATGCAACAGGGAACcTcCCCGGTTGCTCTTTCTCTATCTTCCTTcTgGCTT T GCTgTCC 
489 CTATGCAACAGGGAACTTGCCCGGTTGCTC T T f CTCTATCTTCCTTT T aG CTTl GCTATCC 
489 CTATGCAACAGGGAACTTGCCCGGTTGCTCTTTCTCTATCTTCCTTTrGGCTTTGCTATCC 
489 tTATGCAACAGGGAAtTTGCCCGGTTGCTCTTTCTCTATCTTCCTCTTGGCTTTGCTGTCC 
489 CTATGCAACAGGGAAcTTGCCCGGTTGCTCTTTCTCTATCTTCCTCTTGGCTTTGCTGTCC 
489 CTATGCAACAGGGAATcTGCCCGGTTGCTCTTTCTCTATCTTCCTCTTGGCTTTGCTGTCC 
489 CTAcGCAACAGGGAATTTGCCCGGTTGCTCTTTCTCTATCTTCCTCTTGGCTCTGtTGTCC 
489 CTATGCAACAGGGAATTTGCCCGGTTGCTCTTTTTCTATCTTCCTCTTGGCTCTGCTGTCt 
489 CTATGCAACAGGGAATcTGCCCGGTTGCTCcTTTTCTATCTTCCTCTTGGCTtTGCTGTCC 
489 CTATGCAACAGGGAATTTGCCCGGTTGCTCTTTCTCTATCTTCCTCTTGGCTcTGCTGTCC 
489 CTATGCAACAGGGAATTTGCCCGGTTGCcCTTTCTCTATCTTCCTCTTGGCTtTGCTGTCC 
489 CTATGCAACAGGGAATCTGCCCGGTTGCTCTTTCTCTATCTTCCTCTTGGCTcTGCTGTCC 
489 
489 

489 CTAtGCAACAGGGAATTTACCCGGTTGCTCTTTCTCTATCTTCCTCTTGGCTTTGCTGTCC 
4 89 CTAcGCAACAGGGAATaTACCCGGTTGCTCTTTCTCTATCTTCC TTTI GGCTT T GCTGTCC 
4 89 CTATGCAACAGGGAATCTTCCTGGTTGCTCnTCrCl’ATCTTC d ' l ' l ’ l ’GGC T ’ I ’ l ’GCTCTCT 
489 CTATGCAACAGGGAACCTTCCTGGTTGCTCTTTCTCTATCTTCCTTCTGGCCCTGCTCTCT 
4 89 CTATGCAACAGGGAACCTTCCTGGTTGCTCTTTCTCTATCTTCCTcCTaGCCCTGCTTrCT 
489 CTATGCAACAGGGAACCITCCTGGTTGCTCTTTCTCTATCTTCCTTCTGGCCCTGCTTTCT 
4 89 CTATGCAACAGGGAACCTTCCTGGTTGCTCTTTCTCTATCTTCCTTCTGGCCCTGCTCTCT 
4 89 CTATGCAACAGGGAACCTTCCTGGTTGCTCTTTCTCTATCTTCCTTtTGGCCCTGCTCTCT 

cTAtGCAACAGGGAA t cTgCC cGGTTGCt C tTTcTCTATCTTCCTctTgGCt tTGcTgTC c 


SEP ID NO: ISOLATE 

119 S9 550 

117 IND3 550 

118 XND8 550 

111 D1 550 

112 US 6 550 

113 P10 550 

114 DK1 550 

115 T10 550 

116 SW2 550 

122 HK4 550 

109 SA10 550 

110 S45 550 

123 P8 550 

124 T3 550 

120 HK3 550 

121 HK5 550 


TGTTTGACCATCCCAGCTTCCGCT 
TGTTTGACCATCCCAGCTTCCGCT 
TGTTTGACCgTCCCAGCTTCCGCT 
TGTTTGACCATCCCAGCTTCCGCT 
TGTTTGACCATCCCAGCTTCCGCT 
TGccTGACCATCCCAGCgTCCGCT 
TGTtTGACCATCCCAGCTTCCGCc 
TGTCTGACCATCCCAGCTTCCGCT 
TGTCTGACCATCCCAGCTTCCGCT 
TGTTTGACCATCCCAGCTTCCGCT 
TGTTTaACCATCCCAGCTTCCGCT 
TGcTTGACCATCCCAGCTTCCGCT 
TGt cTGACCATCCCAGCTTCCGCT 
TGCTTGACCATCCCAGCTTCCGCT 
TGCTTGACCACCCCAGCTTCCGCT 
TGC cTGACCACCCCAGtTTCCGCT 
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DR4 550 TGCtTGACCGTGCCCGCaTCgGCC 

US 11 550 TGCCTGACTGTGCCCGCTTCAGCC 

S14 550 TGCCTGACTGTGCCCGCTTCAGCC 

SW1 550 TGCCTGACaGTGCCCGCGTCAGCC 

S18 550 TGtCTGACtGTGCCCGCGTCAGCt 

DK7 550 TGcCTGACcGTGCCCGCtTCgGCc 


103-124 consensus 


TGttTgACcatcCCaGctTCcGCt 
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SEO ID NO: 

ISOLATE 

128 

T2 

125 

T4 

126 

US10 

127 

T9 

125-128 

consensus 

SEO ID NO: 

ISOLATE 

128 

T2 

125 

T4 

126 

US10 

127 

T9 

125-128 

consensus 


128 

T2 

125 

T4 

126 

US10 

127 

T9 

125-128 

consensus 

SEO ID NO 

ISOLATE 

128 

T2 

125 

T4 

126 

US10 

127 

T9 

125-128 

consensus 

SEO ID NO 

ISOLATE 

128 

T2 

125 

T4 

126 

US10 

127 

T9 

125-128 

consensus 

SEO ID NO: 

ISOLATE 

128 

T2 


1 ATGAGCACAAtTCCTAAACCTCAAAGAAAAACCAAAAGAAACACtAACCGTCGCCCACAaG 
1 ATGAGCACAAATCCTAAACCTCAAAGAAAAACCAAAAGAAACACcAACCGTCGCCCACAgG 
1 ATGAGCACAAATCCTAAACCTCAAAGAAAAACCAAAAGAAACACtAACCGTCGCCCACAaG 
1 ATGAGCACAAATCCaAAACCcCAAAGAAAAACCAtAAGAAACACcAACCGTCGCCCACAgG 

ATGAG CACAAaT C C tAAAC C t CAAAGAAAAAC CAaAAGAAACAC - AACCGTCGCCCACA-G 


6 2 ACGTTAAGTTt CCGGGCGGCGGCCAGATCGTTGGCGGAGTATACTTGcTGCCGCGCAGGGG 
6 2 ACGTTAAGTTcCCGGGCGGCGGCCAGATCGTTGGCGGAGTATACTTGTTGCCGCGCAGGGG 
6 2 ACGTTAAGTTt CCGGGCGGCGGCCAGATCGTTGGCGGAGTATACTTGTTGCCGCGCAGGGG 
6 2 ACGTTAAGTTcCCGGGCGGCGGCCAGATCGTTGGCGGAGTATACTTGTTGCCGCGCAGGGG 

ACGTTAAGTT - CCGGGCGGCGGCCAGATCGTTGGCGGAGTATACTTGtTGCCGCGCAGGGG 


123 CCCCAGGTTGGGTGTGCGCGCGACAAGGAAGACTTCGGAGCGgTCCCAGCCtCGTGGaAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACAAGGAAGACTTCGGAGCGaTCCCAGCCACGTGGGAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACAAGGAAGACTTCGGAGCGGTCCCAGCCACGTGGGAGG 
123 CCCtAGGTTGGGTGTGCGCaCGACAAGGAAGACTTCGGAGCGGTCCCAGCCACGTGGGAGG 

CCCcAGGTTGGGTGTGCGCgCGACAAGGAAGACTTCGGAGCGgTCCCAGCCaCGTGGgAGG 


184 CGCCAGCCCATCCC tAAAGATCGGCGCTCCACTGGCAAGTCCTGGGGAAAACCAGGATAcC 
184 CGCCAGCCCATCCCCAAAGATCGGCGCTCCACTGGCAAGTCCTGGGGAAAACCAGGATAtC 
184 CGCCAGCCCATCCCCAAAGATCGGCGCcCCACTGGCAAGTCCTGGGGAAAACCAGGATACC 
184 CGCCAGCCCATCCCCAAAGATCGGCGCtCCACTGGCAAGTCCTGGGGAAAACCAGGATACC 

CGCCAGCCCATCCCcAAAGATCGGCGCtCCACTGGCAAGTCCTGGGGAAAACCAGGATAcC 


245 CCTGGCCCCTGTATGGGAATGAGGGgCTCGGCTGGGCAGGATGGCTCCTGTCCCCCCGAGG 
245 CCTGGCCCCTGTATGGGAATGAGGGACTCGGCTGGGCAGGATGGCTCCTGTCCCCCCGAGG 
245 CtTGGCCCCTATATGGGAATGAGGGACTCGGCTGGGCAGGATGGCTCCTGTCCCCCCGAGG 
24 5 CcTGGCCt CTATATGGGAATGAGGGACTCGGCTGGGCgGGATGGCTCCTGTCCCCCCGAGG 

CcTGGCCcCT - TATGGGAATGAGGGaCTCGGCTGGGCaGGATGGCTCCTGTCCCCCCGAGG 


306 TTCtCGTCCCTCtTGGGGCCCCAATGACCCCCGGCATAGGTCGCGCAAtGTGGGTAAaGTC 
306 TTCCCGTCCCTCcTGGGGCCCCAATGACCCCCGGCATAGGTCGCGCAACGTGGGTAAGGTC 
306 TTCCCGTCCCTCTTGGGGCCCCAcTGAtCCCCGGCATAGGTCGCGCAACGTGGGTAAGGTC 
306 TTCCCGTCCCTCTTGGGGCCCCAgTGAcCCCCGGCATAGGTCGCGCAACGTGGGTAAGGTC 

TTCcCGTCCCTCtTGGGGCCCCAaTGAcCCCCGGCATAGGTCGCGCAAcGTGGGTAAgGTC 


367 ATCGATACCCTAACGTGCgGCtTTGCCGACCTCATGGGGTACaTCCCCGTCGTAGGCGcCC 
367 ATCGATACCCTAACGTGCaGCcTTGCCGACCTCATGGGGTACgTCCCCGTCGTAGGCGgCC 
367 ATCGATACCCTAACGTGCGGCTTTGCCGACCTCATGGGaTACATCCCCGTCGTgGGCGCtC 
367 ATCGATACCCTAACGTGCGGCTTTGCCGACCTCATGGGgTACATCCCCGTCGTaGGCGCcC 

ATCGATACCCTAACGTGCgGCtTTGCCGACCTCATGGGgTACaTCCCCGTCGTaGGCGccC 


428 CGcTtGGTGGtGTCGCCAGAGCTCTtGCGCATGGCGTGAGAGTCCTGGAGGACGGaGTTAA 
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125 T4 428 CGtTgGGTGGCGTCGCCAGAGCTCTCGCGCATGGCGTGAGAGTCCTGGAGGACGGGGTTAA 

126 US 10 428 CGCTTGGTGGCGTCGCCAGAGCTCTCGCGCATGGCGTGAGgGTCCTGGAGGACGGGGTTAA 

127 T9 428 CGCTTGGTGGCGTtGCCAGAGCTCTCGCGCAcGGCGTGAGaGTCCTGGAGGACGGGGTTAA 

125-128 consensus CGcTtGGTGGcGTcGCCAGAGCTCTcGCGCAtGGCGTGAGaGTCCTGGAGGACGGgGTTAA 


ISOLATE 

T2 489 TTATGCAACAGGtAACTTACCcGGTTGCTCCTTTTCTATcTTCTTGCTaGCCCTgCTGTCC 

T4 489 TTATGGAACAGGGAACTTACCtGGTTGCTCCTTTTCTATtTTCTTGCTGGCCCTACTGTCC 

US 10 489 TTATGCAACAGGGAACTTACCcGGTTGCTCCTTTTCTATCTTCTTGCTGGCCtTACTGTCC 

T9 489 TTATGCAACAGGGAACcTACCtGGTTGCTCtTTTTCTATCTTCT TG CTGGCCcTACTGTCC 

125-128 consensus TTATGCAACAGGgAACtTACC - GGTTGCTCcTTTTCTATcTTCTTGCTgGCCcTaCTGTCC 


ISOLATE 

T2 550 TGCLATCACtATTCCgGTtTCaGCT 

T4 550 TGCATCACCATTCCAGTCTCcGCT 

US 10 550 TGCATCACCATTCCAGTCTCTGCT 

T9 550 TGCATCACCAcTCCgG C CTCTGCT 

125-128 consensus TGCATCACcAtTCC-GtcTCtGCT 


SEP ID NO: 
128 

125 

126 
127 


SEP ID NO: 
128 

125 

126 
127 
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SEO ID NO: 

ISOLATE 

131 

DK11 

132 

SW3 

133 

DK8 

129 

T8 


1 ATGAGCACAAATCCTAAACCTCAAAGAAAAACCAAAAGAAATACAAACCGCCGCCCACAGG 
1 ATGAGCACAAATCCTAAACCTCAAAGAAAAACCAAAAGAAATACAAACCGCCGCCCACAGG 

1 ATGAGCACAAAT C CTAAAC CT CAAAGAAAAAC CAAAAGAAACACAAAC C G C CG C C CACAGG 
1 ATGAG CACAAATC CTAAAC CT CAAAGAAAAAC CAAAAGAAACACAAAC C G C CG C C CACAGG 

ATGAG CACAAAT C CTAAAC CT CAAAGAAAAAC CAAAAGAAAcACAAAC CGCCG C C CACAGG 


6 2 ACGTTAAGTTCCCGGGTGGCGGCCAGATCGTTGGCGGAGTTTACTTGCTGCCGCGCAGGGG 
62 ACGTTAAGTTCCCGGGTGGCGGCCAGATCGTTGGCGGAGTTTACTTGCTGCCGCGCAGGGG 
62 ACGTTAAGTTCCCGGGTGGCGGCCAGATCGTTGGCGGAGTTTACTTGCTGCCGCGCAGGGG 
62 ACGTCAAGTTCCCGGGTGGCGGCCAGATCGTTGGCGGAGTTTACTTGCTGCCGCGCAGGGG 
62 ACGTCAAGTTCCCGGGTGGCGGtCAGATCGTTGGCGGAGTTTACTTGCTGCCGCGCAGGGG 

ACGTtAAGTTCCCGGGTGGCGGcCAGATCGTTGGCGGAGTTTACTTGCTGCCGCGCAGGGG 


123 CCCCAGGTTGGGTGTGCGCaCGACAAGGAAGACTTCCGAGCGATCCCAGCCGCGTGGGAGA 
123 CCCCAGGTTGGGTGTGCGCGCGACAAGGAAGACTTCCGAGCGATCCCAGCCGCGTGGGAGA 
123 CCCCAGGTTGGGTGTGCGCGCGACAAGGAAGtCTTCCGAGCGATCCCAGCCGCGTGGGAGg 
123 CCC tAGGTTGGGTGTGCGCGCGACAAGGAAGACTTCCGAGCGATCCCAGCCGCGTGGGAGA 
123 CCCcAGGTTGGGTGTGCGCGCGACAAGGAAGACTTCCGAGCGATCCCAGCCGCGTGGGAGA 

CCCcAGGTTGGGTGTGCGCgCGACAAGGAAGaCTTCCGAGCGATCCCAGCCGCGTGGGAGa 


184 CGCCAGCCCATCCCGAAAGATCGGCGCTCCACCGGCAAGcCCTGGGGAAAGCCAGGATATC 
1 84 CGCCAGCCCATCCCGAAAGATCGGCGCTCCACCGGCAAGTCCTGGGGAAAGCCAGGATATC 
184 CGCCAGCCCATCCCGAAAGATCGGCGCTCCACCGGCAAGTCCTGGGGAAAACCgGGATATC 
184 CGCCAGCCCATCCCGAAAGATCGGCGCTCCACCGGCAAGTCCTGGGGAAAACCAGGATATC 
184 CGCCAGCCCATCCCGAAAGATCGGCGCTCCACCGGCAAGTCCTGGGGAAAgCCAGGATATC 

CGCCAGCCCATCCCGAAAGATCGGCGCTCCACCGGCAAGtCCTGGGGAAAgCCaGGATATC 


245 CTTGGCCCCTGTATGGAAACGAGGGCTGCGGCTGGGCAGGTTGGCTCCTGTCCCCCCGCGG 
245 CTTGGCCCCTGTATGGAAACGAGGGCTGCGGCTGGGCAGGTTGGCTCCTGTCCCCCCGCGG 
245 CTTGGCCCCTGTATGGAAACGAGGGCTGCGGCTGGGCAGGTTGGCTCCTGTCCCCCCGCGG 
245 CTTGGCCTCTtTACGGAAACGAGGGCTGCGGtTGGGCAGGTTGGCTCCTGTCCCCCCGCGG 
24 5 CTTGGCCTCTgTACGGAAACGAGGGCTGCGGcTGGGCAGGTTGGCTCCTGTCCCCCCGCGG 

CTTGGCC cCTgTAtGGAAACGAGGGCTGCGGcTGGGCAGGTTGGCTCCTGTCCCCCCGCGG 


306 GTCTCATCCTAATTGGGGCCCCACTGACCCCCGGCATAaATCACGCAATTTGGGtAAAGTC 
306 GTCTCATCCTAATTGGGGCCCCACTGACCCCCGGCATAGATCACGCAATTTGGGCAAAGTC 
306 GTCTCGTCCTACTTGGGGCCCCACTGACCCCCGGCATAGATCACGCAATTTGGGCAAAGTC 
306 GTCTCGTCCTACTTGGGGCCCCACTGACCCCCGGCATAGATCACGTAATTTGGGCAgAGTC 
306 GTCTCGTCCTACTTGGGGCCCCACTGACCCCCGGGAcAGATCACGTAAcTTGGGCAagGTC 

GTCTCgTCCTAcTTGGGGCCCCACTGACCCCCGGCAtAgATCACGcAAtTTGGGcAaaGTC 


367 ATCGACACCATTACGTGTGGTTTTGCCGACCTCATGGGGTACATCCCTGTCGTcGGCGCCC 
367 ATCGACACCATTACGTGTGGTTTTGCCGACCTCATGGGGTACATCCCTGTCGTTGGCGCCC 
367 ATCGACACCATTACGTGTGGTTTTGCCGACCTCATGGGGTACATCCCTGTCGTTGGCGCCC 
367 ATCGATACCATTACaTGTGGTTTTGCCGACCTCATGGGGTACATCCCTGTCGTTGGCGCCC 
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130 US 1 367 ATCGATACCATTACgTGTGGTTTTGCCGACCTCATGGGGTACATCCCTGTCGTTGGCGCCC 

129-133 consensus ATCGAcACCATTACgTGTGGTTTTGCCGACCTCATGGGGTACATCCCTGTCGTtGGCGCCC 
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428 CGGTCGGAGGCGTCGCCAGAGCTCTGGCACACGGTGTTAGAGTCCTGGAAGACGGGATAAA 
428 CGGTCGGAGGCGTCGCCAGAGCTCTGGCACACGGTGTTAGAGTCCTGGAAGACGGGATAAA 
428 CGGTtGGAGGCGTCGCCAGAGCTCTGGCACACGGTGTTAGGGTCCTGGAAGACGGGATAAA 
428 CGGTCGGAGGCGTCGCCAGAGCTCTGGCACAtGGTGTTAGGGTCCTGGAAGACGGGATAAA 
428 CGGTCGGAGGCGTCGCCAGAGCTCTGGCACAcGGTGTTAGGGTCCTGGAAGACGGGATAAA 


129-133 consensus 


CGGTcGGAGGCGTCGCCAGAGCTCTGGCACAcGGTGTTAGgGTCCTGGAAGACGGGATAAA 
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489 TTACGCAACAGGGAATCTGCCTGGTTGCT CTTTTT CTATCTTCTTACTTGCTCTTCTGTCa 

489 TTACGCAACAGGGAATTTGCCTGGTTGCTCT l * riT CTATCTTCTT G CTTGCTCTTCTGTCG 
4 89 cTAtGCAACAGGGAATTTGCCTGGTTGCTC T * l ‘ l * l TCTATCTTC T , l GCTTGCTCTTCTGTCa 
489 tTAcGCAACAGGGAATcTGCCTGGTTGCTCcTTTTCTATCTTCTTaCTTGCTCTTCTGTCg 


129-133 consensus 


tTAcGCAACAGGGAATcTGCCTGGTTGCTCtTTTTCTAT C T T CT l 'aCTTGCTCTTCTGTCg 
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550 TGCTgCACAGTGCCAGTGTCTGCG 
550 TGCTtCACAGTGCCAGTGTCTGCG 
550 TGCTgCACAGTGCCAGTGTCTGCG 
550 TGCTtCACAGTGCCAGTGTCTGCA 
550 TGCgcCACgGTGCCgGTGTCTGCA 


129-133 consensus 


TGCt - CACaGTGCCaGTGTCTGCg 
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-134 consensus 

SEP 

ID NO: ISOLATE 

131 

DK11 

132 

SW3 

133 

DK8 

129 

T8 


1 ATGAGCACAAATCCTAAACCTCAAAGAAAAACCAAAAGAAATACAAACCGCCGCCCACAGG 
1 ATGAGCACAAATCCTAAACCTCAAAGAAAAACCAAAAGAAATACAAACCGCCGCCCACAGG 
1 ATGAGCACAAATCCTAAACCTCAAAGAAAAACCAAAAGAAACACAAACCGCCGCCCACAGG 
1 ATGAGCACAAATCCTAAACCTCAAAGAAAAACCAAAAGAAACACAAACCGCCGCCCACAGG 
1 ATGAGCACAAATCCTAAACCTCAAAGAAAAACCAAAAGAAACACAAACCGCCGCCCACAGG 
1 ATGAGCACAAATCCTAAACCTCAAAGAAAAACCAAAAGAAACACcAACCGTCGCCCACAGG 

1 ATGAGCACAAATCCaAAACCcCAAAGAAAAACCAtAAGAAACACcAACCGTCGCCCACAgG 
1 ATGAGCACAAtTCCTAAACCTCAAAGAAAAACCAAAAGAAACACTAACCGTCGCCCACAaG 
1 ATGAGCACAAaTCCTAAACCTCAAAGAAAAACCAAAAGAAACACTAACCGcCGCCCACAgG 

ATGAGCACAAaTCCtAAACCtCAAAGAAAAACCAaAAGAAAcACaAACCGcCGCCCACAgG 


6 2 ACGTTAAGTTCCCGGGTGGCGGCCAGATCGTTGGCGGAGTTTACTTGCTGCCGCGCAGGGG 
6 2 ACGTTAAGTTCCCGGGTGGCGGCCAGATCGTTGGCGGAGTTTACTTGCTGCCGCGCAGGGG 
62 ACGTTAAGTTCCCGGGTGGCGGCCAGATCGTTGGCGGAGTTTACTTGCTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGTGGCGGCCAGATCGTTGGCGGAGTTTACTTGCTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGTGGCGGtCAGATCGTTGGCGGAGTTTACTTGCTGCCGCGCAGGGG 
6 2 ACGTTAAGTTCCCGGGCGGCGGCCAGATCGTTGGCGGAGTATACTTGTTGCCGCGCAGGGG 
6 2 ACGTTAAGTTtCCGGGCGGCGGCCAGATCGTTGGCGGAGTATACTTGTTGCCGCGCAGGGG 
62 ACGTTAAGTTcCCGGGCGGCGGCCAGATCGTTGGCGGAGTATACTTGTTGCCGCGCAGGGG 
6 2 ACGTTAAGTTt CCGGGCGGCGGCCAGATCGTTGGCGGAGTATACTTGCTGCCGCGCAGGGG 
62 ACGTcAAGTTcCCGGGCGGtGGCCAGATCGTTGGCGGAGTATACTTGCTGCCGCGCAGGGG 

ACGTtAAGTTcCCGGG - GGcGGcCAGATCGTTGGCGGAGT - TACTTGcTGCCGCGCAGGGG 


123 CCCCAGGTTGGGTGTGCGCaCGACAAGGAAGACTTCCGAGCGATCCCAGCCGCGTGGGAGA 
123 CCCCAGGTTGGGTGTGCGCGCGACAAGGAAGACTTCCGAGCGATCCCAGCCGCGTGGGAGA 
123 CCCCAGGTTGGGTGTGCGCGCGACAAGGAAGtCTTCCGAGCGATCCCAGCCGCGTGGGAGg 
123 CCCtAGGTTGGGTGTGCGCGCGACAAGGAAGACTTCCGAGCGATCCCAGCCGCGTGGGAGA 
123 CCCCAGGTTGGGTGTGCGCGCGACAAGGAAGACTTCCGAGCGATCCCAGCCGCGTGGGAGA 
123 CCCCAGGTTGGGTGTGCGCGCGACAAGGAAGACTTCGGAGCGATCCCAGCCACGTGGGAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACAAGGAAGACTTCGGAGCGGTCCCAGCCACGTGGGAGG 
123 CCCtAGGTTGGGTGTGCGCaCGACAAGGAAGACTTCGGAGCGGTCCCAGCCACGTGGGAGG 
123 CCCcAGGTTGGGTGTGCGCGCGACAAGGAAGACTTCGGAGCGGTCCCAGCCtCGTGGaAGG 
123 CCCgAGaTTGGGTGTGCGCGCGACgAGGAAaACTTCcGAaCGGTCCCAGCCaCGTGGgAGG 

CCCcAGgTTGGGTGTGCGCgCGACaAGGAAgaCTTCcGAgCGaTCCCAGCCgCGTGGgAGg 


184 CGCCAGCCCATCCCGAAAGATCGGCGCTCCACCGGCAAGcCCTGGGGAAAGCCAGGATATC 
184 CGCCAGCCCATCCCGAAAGATCGGCGCTCCACCGGCAAGTCCTGGGGAAAGCCAGGATATC 
184 CGCCAGCCCATCCCGAAAGATCGGCGCTCCACCGGCAAGTCCTGGGGAAAACCgGGATATC 
184 CGCCAGCCCATCCCGAAAGATCGGCGCTCCACCGGCAAGTCCTGGGGAAAACCAGGATATC 
184 CGCCAGCCCATCCCGAAAGATCGGCGCTCCACCGGCAAGTCCTGGGGAAAgCCAGGATATC 
184 CGCCAGCCCATCCCCAAAGATCGGCGCTCCACTGGCAAGTCCTGGGGAAAACCAGGATATC 
184 CGCCAGCCCATCCCCAAAGATCGGCGCcCCACTGGCAAGTCCTGGGGAAAACCAGGATACC 
1 84 CGCCAGCCCATCCCCAAAGATCGGCGCTCCACTGGCAAGTCCTGGGGAAAACCAGGATACC 
1 84 CGCCAGCCCATCCCTAAAGATCGGCGCTCCACTGGCAAGTCCTGGGGAAAACCAGGATACC 
184 CGCCAGCCCATCCCTAAAGATCGGCGCaCCACTGGCAAGTCCTGGGGAAggCCAGGATACC 

CGCCAGCCCATCCCgAAAGATCGGCGCtCCAC - GGCAAGt CCTGGGGAAaaCCaGGATAtC 


245 CTTGGCCCCTGTATGGAAACGAGGGCTGCGGCTGGGCAGGTTGGCTCCTGTCCCCCCGCGG 
245 CTTGGCCCCTGTATGGAAACGAGGGCTGCGGCTGGGCAGGTTGGCTCCTGTCCCCCCGCGG 
245 CTTGGCCCCTGTATGGAAACGAGGGCTGCGGCTGGGCAGGTTGGCTCCTGTCCCCCCGCGG 
245 CTTGGCCTCTtTACGGAAACGAGGGCTGCGGtTGGGCAGGTTGGCTCCTGTCCCCCCGCGG 
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245 CTTGGCCTCTGTACGGAAACGAGGGCTGCGGCTGGGCAGGTTGGCTCCTGTCCCCCCGCGG 
245 CcTGGCCCCTGTATGGGAATGAGGGACTCGGCTGGGCAGGATGGCTCCTGTCCCCCCGAGG 
24 5 CtTGGCCCCTATATGGGAATGAGGGACTCGGCTGGGCAGGATGGCTCCTGTCCCCCCGAGG 
24 5 CCTGGCCtCTATATGGGAATGAGGGACTCGGCTGGGCgGGATGGCTCCTGTCCCCCCGAGG 
245 CCTGGCCCCTGTATGGGAATGAGGGgCTCGGCTGGGCAGGATGGCTCCTGTCCCCCCGAGG 
245 CtTGGCCCCTGTATGGGAATGAGGGcCTCGGCTGGGCAGGgTGGCTCCTGTCCCCCCGcGG 

CtTGGCCcCTgTAtGG - AA- GAGGGc - - CGGcTGGGCaGGtTGGCTCCTGTCCCCCCGcGG 


306 GTCTCATCCTAATTGGGGCCCCACTGACCCCCGGCATAaATCACGCAATTTGGGtAAAGTC 
306 GTCTCATCCTAATTGGGGCCCCACTGACCCCCGGCATAGATCACGCAATTTGGGCAAAGTC 
306 GTCTCGTCCTACTTGGGGCCCCACTGACCCCCGGCATAGATCACGCAATTTGGGCAAAGTC 
306 GTCTCGTCCTACTTGGGGCCCCACTGACCCCCGGCATAGATCACGTAATTTGGGCAgAGTC 
306 GTCTCGTCCTACTTGGGGCCCCACTGACCCCCGGCAcAGATCACGTAACTTGGGCAAGGTC 
306 TTCCCGTCCCTCcTGGGGCCCCAaTGACCCCCGGCATAGGTCGCGCAACGTGGGTAAGGTC 
306 TTCCCGTCCCTCTTGGGGCCCCAcTGAtCCCCGGCATAGGTCGCGCAACGTGGGTAAGGTC 
306 TTCCCGTCCCTCTTGGGGCCCCAgTGACCCCCGGCATAGGTCGCGCAACGTGGGTAAGGTC 
306 TTCTCGTCCCTCTTGGGGCCCCAaTGACCCCCGGCATAGGTCGCGCAAtGTGGGTAAaGTC 
306 TTCTCGcCCtTCaTGGGGCCCCAccGACCCCCGGCATAaaTCGCGCAActTGGGTAAgGTC 

-TCtCgtCCt - ctTGGGGCCCCActGAcCCCCGGCAtAgaTC - CGcAA- tTGGGtAa-GTC 


367 ATCGACACCATTACGTGTGGTTTTGCCGACCTCATGGGGTACATCCCTGTCGTcGGCGCCC 
367 ATCGACACCATTACGTGTGGTTTTGCCGACCTCATGGGGTACATCCCTGTCGTTGGCGCCC 
367 ATCGACACCATTACGTGTGGTTTTGCCGACCTCATGGGGTACATCCCTGTCGTTGGCGCCC 
367 ATCGATACCATTACaTGTGGTTTTGCCGACCTCATGGGGTACATCCCTGTCGTTGGCGCCC 
367 ATCGATACCATTACGTGTGGTTTTGCCGACCTCATGGGGTACATCCCTGTCGTTGGCGCCC 
367 ATCGATACCCTAACGTGCaGCcTTGCCGACCTCATGGGGTACgTCCCCGTCGTaGGCGgCC 
367 ATCGATACCCTAACGTGCGGCTTTGCCGACCTCATGGGaTACATCCCCGTCGTgGGCGCtC 
367 ATCGATACCCTAACGTGCGGCTTTGCCGACCTCATGGGGTACATCCCCGTCGTAGGCGCCC 
367 ATCGATACCCTAACGTGCGGCTTTGCCGACCTCATGGGGTACATCCCCGTCGTAGGCGCCC 
367 ATCGATACCCTAACGTGCGGtTTTGCCGACCTCATGGGGTACATaCCCGTCGTtGGCGCtC 

ATCGAtACC-T- ACgTG- gGttTTGCCGACCTCATGGGgTACaTcCC -GTCGTtGGCGccC 


42 8 C GGTCGGAGG CGTCGCCAGAG CTCTGG CACACGGTGTTAGAGTC CTGGAAGACGGGATAAA 
42 8 C GGTCGGAGG CGTCGCCAGAG CTCTGG CACACGGTGTTAGAGTC CTGGAAGACGGGATAAA 
428 C GGTtGGAGGCGTCGCCAGAG CTCTGG CACACGGTGTTAGGGTC CTGGAAGACGGGATAAA 
428 CGGTCGGAGGCGTCGCCAGAGCTCTGGCACAtGGTGTTAGGGTC CTGGAAGACGGGATAAA 
428 CGGTCGGAGGCGTCGCCAGAGCTCTGG CACAcGGTGTTAGGGTC CTGGAAGACGGGATAAA 
428 CGtTgGGTGGCGTCGCCAGAGCTCTCGCGCATGGCGTGAGaGTCCTGGAGGACGGGGTTAA 
428 CGCTTGGTGGCGTCGCCAGAGCTCTCGCGCATGGCGTGAGgGTCCTGGAGGACGGGGTTAA 
428 CGCTTGGTGGCGTtGCCAGAGCTCTCGCGCAcGGCGTGAGAGTCCTGGAGGACGGGGTTAA 
428 CGCTTGGTGG tGTcGCCAGAGCTCTtGCGCATGGCGTGAGAGTCCTGGAGGACGGaGTTAA 
428 CcgTTGGcGGcGTtGCCAGAGCcCTcGCcCATGGgGTGAGgGTtCTGGAGGACGGgaTaAA 

CggTtGGaGG cGTcGCCAGAGCt CTgGCaCA -GGtGT-AG- GTcCTGGA- GACGGgaTaAA 


489 

489 

489 

489 

489 

489 

489 

489 

489 



TTATGCAACAGGGAACTTACCTGGTTGCTCCTTTTCTATtTTCTTGCTGGCCCTACTGTCC 
TTATG CAACAGGGAACTTAC C CGGTTG CTC CTTTTCTAT C TI 'CT T G CTGG C C t TACTG TC C 
TTATGCAACAGGGAAC cTACCtGGTTGCTCtTTTTCTATCTTCTTGCTGGCCCTACTGTCC 
TTATGCAACAGGtAACTTACCCGGTTGCTCcTTTTCTATCTTCTTGCTaGCCCTgCTGTCC 



FIGURE 6F 


134 S83 489 TTATGCAACgGGgAAtTTgCCCGGTTGCTCtTTcTCTATCTTtcTctTgGCCCTctTGTCt 

125 - 134 consensus tTAtGCAACaGGgAA t tTgCCtGGTTGCTCtTTtTCTAT cTT c tTgcTtGC - cTt cTGTCc 
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550 TGCATCACCATTCCAGTCTCcGCT 
550 TGCATCACCATTCCAGTCTCTGCT 
550 TGCATCACCAcTCCGGcCTCTGCT 
550 TGCATCACTATTCCGGTTTCaGCT 
550 TGCATCtCTgTgCCaGTTTCcGCc 


125-134 consensus 


TGCatCaCagtgCCaGtgTCtGCt 
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consensus 

SEO ID NO: 

ISOLATE 

138 
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1 ATGAGCACACTTCCTAAACCTCAAAGAAAAACCAAAAGAAACACCATCCGTCGCCCACAGG 
1 ATGAGCACACTTCCTAAACCTCAAAGAAAAACCAAAAGAAACACCATCCGTCGCCCACAGG 
1 ATGAGCACACTTCCTAAACCTCAAAGAAAAACCAAAAGAAACACCATCCGTCGCCCACAGG 
1 ATGAG CACACTT C CTAAAC CTCAAAGAAAAAC CAAAAGAAACAC CATC C GTC G C C CACAGG 

ATGAGCACACTTCCTAAACCTCAAAGAAAAACCAAAAGAAACACCATCCGTCGCCCACAGG 


6 2 ACGTcAAGTTCCCGGGTGGCGGACAGATCGTTGGTGGAGTATACGTGTTGCCGCGCAGGGG 
6 2 ACGTTAAGTTCCCGGGTGGCGGACAGATCGTTGGTGGAGTATACGTGTTGCCGCGCAGGGG 
6 2 ACGTTAAGTTCCCGGGTGGCGGACAGATCGTTGGTGGAGTATACGTGTTGCCGCGCAGGGG 
6 2 ACaTcAAGTTCCCGGGTGGCGGACAGATCGTTGGTGGAGTATACGTGTTGCCGCGCAGGGG 

ACgT - AAGTTCCCGGGTGGCGGACAGATCGTTGGTGGAGTATACGTGTTGCCGCGCAGGGG 


123 CCCACGATTGGGTGTGCGCGCGACGCGTAAAACTTCTGAACGGTCaCAGCCTCGCGGACGg 
123 CCCACGATTGGGTGTGCGCGCGACGCGTAAAACTTCTGAACGGTCgCAGCCTCGCGGACGA 
123 CCCACGATTGGGTGTGCGCGCGACGCGTAAAACTTCTGAACGGTCACAGCCTCGCGGACGA 
123 CCCACGATTGGGTGTGCGCGCGACGCGTAAAACTTCTGAACGGTCACAGCCTCGCGGACGg 

CCCACGATTGGGTGTGCGCGCGACGCGTAAAACTTCTGAACGGTCaCAGCCTCGCGGACG- 


184 CGACAGCCTATCCCCAAGGCGCGTCGGAGCGAAGGCCGGTCCTGGGCTCAGCCtGGGTACC 
184 CGACAGCCTATCCCCAAGGCGCGTCGGAGCGAAGGCCGGTCCTGGGCTCAGCCCGGGTACC 
184 CGACAGCCTATCCCCAAGGCGCGTCGGAGCGAAGGCCGGTCCTGGGCTCAGCCCGGGTACC 
184 CGACAGCCTATCCCCAAGGCGCGTCGGAGCGAAGGCCGaTCCTGGGCTCAGCCCGGGTACC 

CGACAGCCTATCCCCAAGGCGCGTCGGAGCGAAGGCCGgTCCTGGGCTCAGCCcGGGTACC 


245 CTTGGCCCCTCTATGGTAACGAGGGCTGCGGGTGGGCAGGgTGGCTCCTGTCCCCACGCGG 
245 CTTGGCCCCTCTATGGTAACGAGGGCTGCGGGTGGGCAGGaTGGCTCCTGTCCCCACGCGG 
245 CTTGGCCCCTCTATGGTAAtGAGGGCTGCGGGTGGGCAGGGTGGCTCCTGTCCCCACGCGG 
245 CTTGGCCCCTCTATGGTAAcGAGGGCTGCGGGTGGGCAGGGTGGCTCCTGTCCCCACGCGG 

CTTGGCCCCTCTATGGTAAcGAGGGCTGCGGGTGGGCAGGgTGGCTCCTGTCCCCACGCGG 


306 CTCCCGTCCATCTTGGGGCCCAAACGACCCCCGGCGgaGGTCCCGCAATTTGGGTAAgGTC 
306 CTCCCGTCCATCTTGGGGCCCAAACGACCCCCGGCGacGGTCCCGCAATTTGGGTAAAGTC 
306 CTCCCGTCCATCTTGGGGCCCAAACGACCCCCGGCGGAGGTCCCGCAATTTGGGTAAAGTC 
306 CTCCCGTCCATCTTGGGGCCCAAAtGACCCCCGGCGGAGGTCCCGCAATTTGGGTAAAGTC 

CTCCCGTCCATCTTGGGGCCCAAAcGACCCCCGGCGgaGGTCCCGCAATTTGGGTAAaGTC 


367 ATCGATACCCTcACGTGCGGATTCGCCGACCTCATGGGGTACATCCCGCTCGTCGGCGCTC 
367 ATCGATACCCTTACGTGCGGATTCGCCGACCTCATGGGGTACATCCCGCTCGTCGGCGCTC 
367 ATCGATACCCTTACGTGCGGATTCGCCGACCTCATGGGGTACATCCCGCTCGTCGGCGCTC 
367 ATCGATACCCTTACGTGCGGcTTCGCCGACCTCATGGGGTACATCCCGCTCGTCGGCGCTC 

ATCGATACCCTtACGTGCGGaTTCGCCGACCTCATGGGGTACATCCCGCTCGTCGGCGCTC 


428 CtGTAGGgGGCGTCGCAAGAGCCCTCGCGCATGGCGTGAGGGCCCTTGAAGACGGGATAAA 





FIGURE 6G 
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S52 428 CCGTAGGAGGCGTCGCAAGAGCCCTCGCGCATGGCGTGAGGGCCCTTGAAGACGGGATAAA 

S2 428 CCGTAGGAGGCGTCGCAAGAGCCCTCGCGCATGGCGTGAGGGCCCTTGAAGACGGGATAAA 


135-138 consensus 


CcGTAGGaGGCGTCGCAAGAGCCCTCGCGCATGGCGTGAGGGCCCTTGAAGACGGGATAAA 
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489 TTTCGCAACAGGGAACTTGCCCGGTTGCTC CTTTT CTATCTTCCTTCTTGCTCTGTTCTCT 
489 TTTCGCAACAGGGAACTTGCCCGGTTGCTCCTTITCTATCTTCCTTCTTGCTCTGTTCTCT 
4 89 TTTTGCAACAGGGAACTTGCCCGGTTGCrCCTTTTCTATCTTCCTTCTTGCTCTGTTCTC C 
489 TTrTGCAACAGGGAACTTGCCCGGTTGCTCtTTTTCTATCTTCCTTCTTGCcCTGTTCTCt 


135-138 consensus 


TTT - GCAACAGGGAACTTGCCCGGTTGCTCcTTTTCTATCTTCCTTCTTGCtCTGTTCTCt 
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550 TG C cT AATT CATC CAG CAG CT AG T 
550 TGCTTAATTCATCCAGCAGCTAGT 
550 TGCTTAgTTCATCCtGCAGCTAGT 
550 TGCTTAaTTCATCCaGCAGCTAGT 


135-138 consensus 


TGCtTAaTTCATCCaGCAGCTAGT 
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1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCaATGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCCATGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCCATGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCtATGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCCATGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGCCGCCCCATGG 
I ATGAG CAC aAAT C CTAAAC CT CAAAGAAAAAC CAAACGTAACAC CAAC CGtCGCCC CATGG 

ATGAGCACgAATCCTAAACCTCAAAGAAAAACCAAACGTAACACCAACCGcCGCCCcATGG 


6 2 ACGTTAAGTTCCCGGGTGGcGGCCAGATCGTTGGCGGAGTTTACTTGTTGCCGCGCAGGGG 
6 2 ACGTTAAGTTCCCGGGTGGTGGCCAGATCGTTGGCGGAGTTTACTTGTTGCCGCGCAGGGG 
6 2 ACGTTAAGTTC C CGGGC GGTGGCCAGATCGTTGG CGGAGTTTACTTGTTGC CGCGCAGGGG 
6 2 AtGTAAAaTTCCCaGGCGGcGGCCAGATCGTTGGCGGAGTTTACTTGTTGCCGCGCAGGGG 
62 AcGTAAAgTTCCCGGGTGGTGGCCAGATCGTTGGCGGAGTTTACTTGTTGCCGCGCAGGGG 
62 ATGTAAAATTCCCGGGTGGTGGtCAGATCGTTGGCGGAGTTTACTTGTTGCCGCGCAGGGG 
62 ATGTgAAATTCCCGGGcGGcGGcCAGATCGTTGGCGGAGTTTACTTGcTGCCGCGCAGGGG 

AcGT-AAgTTCCCgGGtGGtGGcCAGATCGTTGGCGGAGTTTACTTGtTGCCGCGCAGGGG 


123 CCCtAGaTTGGGTGTGCGCGCGACTAGGAAGACTTCGGAGCGGTCGCAACCTCGTGGGAGg 
123 CCCCAGgTTGGGTGTGCGCGCGACTAGGAAGACTTCGGAGCGGTCGCAACCTCGTGGGAGA 
123 CCCCAGaTTGGGTGTGCGCaCaACTAGGAAGACTTCGGAGCGGTCGCAACCTCGTGGGAGA 
123 CCCCAGGTTGGGTGTGCGCGCGACTCGGAAGACTTCGGAGCGGTCGCAACCTCGTGGCAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTCGaAAGACTTCGGAGCGGTCGCAACCTCGTGGCAGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTCGGAAGACTTCGGAGCGGTCGCAACCTCGcGGCAGG 
123 CCCCcGGTTGGGTGTGCGCGCagCTCGGAAGACTTCGGAGCGGTCaCAACCTCGtGGCAGG 

CCCcaGgTTGGGTGTGCGCgCgaCTcGgAAGACTTCGGAGCGGTCgCAACCTCGtGGcAGg 


1 84 CGCCAGCCTATCCCCAAGGCgCGcCaActcGAGGGtAGGTCCTGGGCTCAGCCtGGGTATC 
184 CGCCAGCCTATCCCCAAGGCACGTCGATCTGAGGGAAGGTCCTGGGCTCAGCCCGGGTATC 
184 CGTCAGCCTATCCCCAAGGCACGTCGATCTGAGGGAAGGTCCTGGGCTCAaCCCGGGTACC 
184 CGTCAGCCTATCCCCAAGGCACGTCGGTCcGAGGGtAGGTCCTGGGCTCAGCCCGGGTACC 
184 CGTCAaCCTATCCCCAAGGCgCGcCaGcCaGAGGGCAGaTCCTGGGCgCAGCCCGGGTACC 
184 CGTCAGCCTATCCCCcAGGCaCGtCGGTCCGAGGGCAGGTCCTGGGCTCAGCCCGGGTACC 
184 CGTCAGCCTATCCCCaAGGCgCGcCGGTCCGAGGGCAGGTCCTGGGCTCAGCCCGGGTACC 

CGtCAgCCTATCCCCaAGGCaCGtCggtccGAGGGcAGgTCCTGGGCtCAgCCcGGGTAcC 


245 CtTGGCCc CTTT ACGGcAATGAGGGcTGCGGGTGGGCGGGATGGCTCCTGTCACCCCGTGG 
245 CATGGCCTCTTTACGGTAATGAGGGTTGCGGGTGGGCGGGATGGCTCCTGTCACCCCGTGG 
245 CATGGCCTCTTTACGGTAAcGAGGGTTGCGGGTGGGCAGGATGGCTCtTGTCACCCCGTGG 
245 CATGGCCTCTTTACGGTAATGAaGGCTGtGGGTGGGCAGGtTGGCTCCTGTCcCCCCGCGG 
245 CTTGGCCc CTcTA TGGCAATGAGGGCTGcGGGTGGGCAGGGTGGCTCCTGTCtCCtCGCGG 
245 CTTGGCCt CTTTA TGGCAATGAGGGCTGTGGGTGGGCAGGGTGGCTCCTGTCCCCCCGCGG 
245 CTTGGCCcCTTTAcGGCAATGAGGGCTGTGGGTGGGCAGGGTGGCTCCTGTCCCCCCGCGG 

CtTGGCC t CTtTAcGG cAAtGAgGG cTGcGGGTGGGCaGG - TGGCTCcTGTC - CCcCGcGG 


306 CTCTCGgCCGTCTTGGGGcCCgAATGATCCCCGGCGgAGGTCCCGCAACTTGGGTAAGGTC 
306 CTCTCGACCGTCTTGGGGtCCAAATGATCCCCGGCGAAGGTCCCGCAACTTGGGTAAGGTC 
306 CTCTCGACCGTCTTGGGGCCCAAATGATCCCCGGCGAAGGTCCCGCAACTTGGGTAAGGTC 
306 CTCTCGACCGTCTTGGGGCCCAAATGATCCCCGGCGGAGGTCGCGCAATTTGGGTAAGGTC 
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Z4 306 CTCTCGGCCATCTTGGGGCCCAAATGATCCCCGGCGGAGaTCGCGCAATCTGGGTAAGGTC 
Z5 306 aTCTCGGCCATCTTGGGGCCaAAATGATCCCCGGCGTAGGTCCCGCAATCTGGGTAAGGTC 
Z 1 306 tTCcaGGCCgTCTTGGGGCCccAATGATCCCCGGCGTAGGTCCCGtAATCTGGGTAAaGTC 
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cTCtcGgCCgTCTTGGGGcCcaAATGATCCCCGGCGgAGgTCcCGcAAttTGGGTAAgGTC 


367 ATCGATACcCTAACTTGCGGcTTCGCCGAcCTCATGGGATACATCCCGgTCGTAGGCGCCC 
367 ATCGATAC t CTAACTTGCGG tTTCGC CGA t CTCATGGGATACATCCCGCTCGTAGG CGC C C 
367 ATCGATACCCTAACcTGCGGCTTtGCCGACCTCATGGGATACATCCCGCTCGTAGGCGCCC 
367 ATCGATACCCTcACGTGCGGCTTCGCCGACCTCATGGGATACATCCCGCTCGTGGGCGCCC 
367 ATCGATACCCTGACGTGCGGCTTCGCCGACCTCATGGGATACATCCCGaTCGTGGGCGCCC 
367 ATCGATACCCTGACGTGTGGCTTCGCCGACCTCATGGGATACATTCCGCTCGTcGGCGCCC 
367 ATCGATACCCTGACGTGTGGCTTCGCCGACCTCATGGGATACATTCCGCTCGTaGGCGCCC 

ATCGATACcCT-ACgTGcGGcTTcGCCGAcCTCATGGGATACATcCCGcTCGTaGGCGCCC 


428 CCGTGGGtGGCGTCGCCAGaGCCCTGGCgCATGGcGTcAGGctTcTGGAGGACGGGgTCAA 
428 CCGTGGGCGGCGTCGCCAGGGCCCTGGCaCATGGtGTTAGGGCTgTGGAGGACGGGATCAA 
428 CCGTGGGCGGCGTCGCCAGGGCCCTaGCGCATGGCGTTAGGGCTcTGGAGGACGGGATtAA 
428 CaGTaGGaGGCGTCGCCAGaGCCCTGGCGCATGGCGTCAGGGCTGTGGAGGACGGGATcAA 
428 CcGTgGGgGGCGTCGCCAGGGCtCTGGCGCATGGCGTCAGGGCTGTGGAGGACGGGATtAA 
428 CaGTaGGTGGCGTCGCCAGGGCCtTGGCGCATGGCGTCAGGGCCcTGGAGGACGGAATcAA 
428 CtGTgGGTGGCGTCGCCAGGGCCcTGGCGCATGGCGTCAGGGCCgTGGAGGACGGAATtAA 

CcGTgGGtGGCGTCGCCAGgGCccTgGCgCATGGcGTcAGGgctgTGGAGGACGGgaTcAA 


489 TTATGCAACAGGGAATCTTCCCGGTTGCTCTTTCTCTATCTTCCTCTTGGCACTgCTcTCG 
489 TTATGCAACAGGGAATCTTCCCGGTTGCTCTTTCTCTATCTTCCTCTTGGCACTTCTTTCG 

489 CTATGCAACAGGGAACCTTCCtGGTTGCTCTTTCTCTATCTTCCTCTTGGCACTTCTcTCG 
489 CTATGCAACAGGGAATCTTCCcGGTTGCTCTTTCTCTATCTTCCTtTTGGCACTTCTtTCG 
489 CTATGCAACAGGGAATCTTCCTGGTTGCTCcTTtTCTATCTTCCTaCTTGCACTTtTCTCG 
4 89 CTAcGCAACAGGGAAcCTTCCTGGTTGCTCtTTcTCTATCTTtCTtCTTGCACTTcTCTCG 

CTAt G CAACAGGGAA t CTTCC cGGTTGCTC tTTcTCTATCTTcCTc tTgGCACTt cTcTCG 


550 TGCCTgACTGTTCCCgCtTCGGCC 
550 TGCCTaACTGTTCCCaCCTCGGCC 
550 TGCCTgACTGTTCCCGCCTCGGCC 
550 TGCCTaACcGTcCCAGCGTCtGCT 
550 TGCCTcACtGTtCCAGCGTCgGCT 
550 TGCtTGACAACACCgGCATCcGCT 
550 TGCcTGACAACACCaGCATCtGCc 

TGCcTgACtgttCC - gC -TCgGCc 
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153 consensus 


1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCcAAAGAAACACCAACCGCCGCCCACAGG 
1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAAAGAAACACCAACCGCCGCCCACAGG 

1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAAAGAAACACCAACCGCCGCCCACAGG 

1 ATGAGCACGAATCCTAAACCTCAAAGAAAAACCAAAAGAAACACCAACCGCCGCCCACAGG 


ATGAGCACGAATCCTAAACCTCAAAGAAAAACCaAAAGAAACACCAACCgCCGCCCACAGG 


6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTC C CGGGCGGTGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCG CAGGGG 
6 2 ACGTtAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTcTACTT G TTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCG CAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGG 
6 2 ACGTCAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGG 

ACGTcAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTtTACTTGTTGCCGCGCAGGGG 


123 CCCTaGgtTGGGTGTGCGCGCGACTCGGAAGACTTCaGAACGGTCGCAACCCCGTGGgCGG 
123 CCCTcGtaTGGGTGTGCGCGCGACTCGGAAGACTTCgGAACGGTCGCAACCCCGTGGaCGG 
123 CCCTAGgTTGGGTGTGCGCGCGACTCGGAAGACTTCAGAACGGTCGCAACCCCGTGGGCGG 
123 CCCTAGaTTGGGTGTGCGCGCGACTCGGAAGACTTCAGAACGGTCGCAACCCCGTGGGCGG 
123 CCCTAGGTTGGGTGTGCGCGCGACTCGGAAGACTTCAGAACGGTCGCAACCCCGTGGGCGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTCGGAAGACTTCgGAACGGTCGCAACCCCGTGGGCGG 
123 CCCCAGGTTGGGTGTGCGCGCGACTCGGAAGACTTCAGAACGGTCGCAACCCCGTGGACGG 
123 CCCtAGGTTGGGTGTGCGCGCaACTCGGAAGACTTCAGAACGGTCGCAACCCCGTGGACGG 

CCCtaGgtTGGGTGTGCGCGCgACTCGGAAGACTTCaGAACGGTCGCAACCCCGTGGgCGG 


184 CGTCAGCCTATTCCCAAGGCGCGCCAAcCCaCGGGcCGGTCCTGGGGTCAACCCGGGTACC 
184 CGTCAGCCTATTCCCAAGGCGCGCCAAtCCgCGGGtCGGTCCTGGGGTCAACCCGGGTACC 
184 CGCCAGCCTATTCCCAAGGCGCGCCAACCCACGGGCCGGTCCTGGGGTCAACCCGGGTACC 
184 CGCCAGCCTATTCCCAAGGCGCGCCAACCCACGGGCCGGTCCTGGGGTCAACCCGGGTACC 
184 CGCCAGCCTATTCCCAAGGCGCGCCAACCCACGGGCCGGTCCTGGGGTCAACCCGGGTACC 
184 CGCCAGCCTATTCCCAAGGCGCGCCAACCCACGGGCCGGTCCTGGGGTCAACCCGGGTACC 
184 CGCCAGCCTATTCCCAAGGCtCGCCAGCCCACGGGCCGGTCCTGGGGTCAACCCGGGTACC 
184 CGtCAGCCTATcCCCAAGGCgCGCCAGCCCACGGGCCGGTCCTGGGGTCAACCCGGGTACC 

CGcCAGCCTATtCCCAAGGCgCGCCAacCCaCGGGcCGGTCCTGGGGTCAACCCGGGTACC 


245 CTT GGCCC tTTT ACGCCAATGAGGGCCTCGGGTGGGCAGGGTGGcTGCTCTCCCCtCGAGG 
245 CTTGGCCC CTTT ACGCCAATGAGGGCCTCGGGTGGGCAGGGTGGTTGCTCTCCCCCCGAGG 
245 CTT GGCCC CTTTA CGCCAATGAGGGCCTCGGGTGGGCAGGGTGGTTGCTCTCCCCCCGAGG 
245 CTT GGCCC CTTT ACGCCAATGAGGGCCTCGGGTGGGCAGGGTGGTTGCTCTCCCCCCGAGG 
245 CTT GGCCC CTTTA CGCCAATGAGGGCCTCGGGTGGGCAGGGTGGTTGCTCTCCCCCCGAGG 
245 CTT GGCCCCTTTACGCCAATGAGGGCCTCGGGTGGGCAGGGTGGTTGCTCTCCCCCCGAGG 

24 5 CTTGGCCCCTTTAtGCCAATGAGGGCCTCGgGTGGGCAGGGTGGTTGCTCTCCCCCCGAGG 

CTTGGCCC cTTTAcGCCAATGAGGGCCTCGgGTGGGCAGGGTGG tTGCTCTCCCC cCGAGG 
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306 CTCTCGGCCTAAcTGGGGCCCCAATGACCCCCGGCGAAgATCGCGCAATTTGGGcAAGGTC 
306 CTCTCGGCCTAATTGGGGCCCCAATGACCCCCGGCGAAAATCGCGCAATTTGGGTAAGGTC 
306 CTCTCGGCCTAATTGGGGCCCCAATGACCCCCGGCGAAAgTCGCGCAATTTGGGTAAGGTC 
306 CTCTCGGCCTAATTGGGGCCCCAATGACCCCCGGCGAAAaTCGCGCAATTTGGGTAAGGTC 
306 CTCTCGGCCTAATTGGGGCCCCAATGACCCCCGGCGAAAGTCGCGCAATTTGGGTAAGGTC 
306 CTCTCGGCCTAATTGGGGCCCCAATGACCCCCGGCGGAAGTCGCGCAATTTGGGTAAGGTC 
306 CTCTCGGCCTAgTTGGGGCCCCAAcGACCCCCGGCGGAAATCGCGCAATTTGGGTAAGGTC 
306 CTCTCGGCCTAaTTGGGGCCCCAAtGACCCCCGGCGGAAATCGCGCAAcTTGGGTAAGGTC 

CTCTCGGCCTAatTGGGGCCCCAAtGACCCCCGGCGaAaaTCGCGCAAtTTGGGtAAGGTC 


367 ATCGATACCCTAACGTGCGGATTCGCCGACCTCATGGGGTACATCCCGCTCGTAGGCGGCC 
367 ATCGATACCCTAACGTGCGGATTCGCCGACCTCATGGGGTACATCCCGCTCGTAGGCGGCC 
367 ATCGATACCCTAACGTGCGGATTCGCCGACCTCATGGGGTACATCCCGCTCGTAGGCGGCC 
367 ATCGATACCCTAACGTGCGGATTCGCCGACCTCATGGGGTACATCCCGCTCGTAGGCGGCC 
367 ATCGAcACCCTAACaTGCGGATTCGCCGACCTCATGGGGTACATCCCGCTCGTAGGCGGCC 
367 ATCGATACCCTAACGTGCGGATTCGCCGACCTCATGGGGTACATCCCGCTCGTAGGCGGCC 
367 ATCGATACCCTAACGTGCGGATTCGCCGAtCTGATGGGGTACATCCCGCTCGTAGGCGGCC 
367 ATCGATACCCTgACGTGCGGATTCGCCGAcCTGATGGGGTACATCCCGCTCGTAGGCGGCC 

ATCGAtACCCTaACgTGCGGATTCGCCGAcCTCATGGGGTACATCCCGCTCGTAGGCGGCC 


428 CCGTTGGGGGCGTCGCAAGGGCcCTCGCACACGGTGTGAGaGcTCTTGAGGACGGGGTAAA 
428 CCGTTGGGGGCGTCGCAAGGG C t CTCGCACACGGTGTGAGG G TTCTTGAGGACGGGGTAAA 
428 CCGTTGGGGGCGTCGCAAGGGCCCTtGCACATGGTGTGAGGGTTCTTGAGGACGGGGTAAA 
428 CCGTTGGGGGCGTCGCAAGGGCCCTCGCACATGGTGTGAGG G TTCTTGAGGACGGGGTAAA 
428 CCGTTGGGGGCGTCGCAAGGGCTCTCGCACACGGTGTGAGGGTTCTTGAGGACGGGGTAAA 
428 CCGTTGGGGGCGTCGCAAGGGCTCTCGCACACGGTGTGAGGGTTCTTGAGGACGGGGTAAA 
428 CCGTTGGGGGCGTCGCAAGGG CTCTCGCACAtGGTGTGAGGGTTCTTGAGGACGGGGTAAA 
428 CCGTTGGGGGCGTCGCAAGGGCTCTCGCACAcGGTGTGAGGGTcCTTGAGGACGGGGTAAA 

CCGTTGGGGGCGTCGCAAGGGCtCTcGCACAcGGTGTGAGgGttCTTGAGGACGGGGTAAA 


489 CTATGCAACAGGGAATcTtCCCGGTTGCTCTTTCTCcATCTTTaTCCTTGCACTTCTCTCG 
489 CTATGCAACAGGGA ATTT GCCCGGTTGCTCTTTCTCTATCTT r gTCCTTGCACTTCTCTCG 
489 CTATGCAACgGGGAATTTGCCCGGTTGCTCTTTCTCTATCTTTATCCTTGCACTTCTCTCG 
489 CTATGCAACAGGGAATTTGCCCGGTTGCTCTTTCTCTATCTTTATCCTTGCACTTCTCTCG 
489 tTACGCAACAGGGAATcTGCCCGGTTGCTCTTTCTCTATCTTTATCCTTGCACTTCTCTCG 
489 CTACGCAACAGGGAATTTGCCCGGTTGCT CITT CTCTATC TIT ATCCTTGCACTTCTTTC C 
489 CTACGCAACAGGGAATTTACCCGGTTGCTCTTTCTCTATCTTTATCCTTGCACTTCTTTCA 
489 CTAtGCAACAGGGAATTTACCCGGTTGCTCTTTCTCTATCTTTATCCTTGCACTTCTTTCA 

cTAtGCAACaGGGAATtTgCCCGGTTGCTCTTTCTCtATCTTTaTCCTTGCACTTCTcTCg 


550 TGCtTgACCGTCCCgGCCaCTGCA 
550 TGCCTaACCGTCCCtGCCTCTGCA 
550 TGCCTGACCGTCCCgGCCTCTGCA 
550 TGCtTGACCGTCCCAGCCTCTGCA 
550 TGCCTGACCGTCCCAGCCTCcGCA 
550 TGtCTGAtCaTCCCGGCCTCTGCA 
550 TGCCTGACCGTCCCGGCCTCTGCA 
550 TGCCTGACtGTCCCGaCCTCTGCc 

TGccTgAccgTCCCggCCtCtGCa 
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123 

135-138 

3 

123 

139-145 

4 

123 

146-153 

5 

123 

154 

6 

123 

SEO ID NO: 

Genotype 


103-154 

cons . 

184 

103-124 

1 

184 

125-134 

2 

184 

135-138 

3 

184 

139-145 

4 

184 

146-153 

5 

184 

154 

6 

184 

SEO ID NO: 

Genotype 


103-154 

cons . 

245 

103-124 

1 

245 

125-134 

2 

245 

135-138 

3 

245 

139-145 

4 

245 

146-153 

5 

245 

154 

6 

245 

SEO ID NO: 

Genotvne 


103-153 

cons . 

306 

103-124 

1 

306 

125-134 

2 

306 

135-138 

3 

306 

139-145 

4 

306 

146-153 

5 

306 

154 

6 

306 


ATGAGCAC gAaTC CTAAACCTCAAAGAaAaACCAAACGTAACAC CAaC CG c CGC CCACAGG 
ATGAGCACAAaTCCtAAACCtCAAAGAAAAACCAaAAGAAAcACaAACCGcCGCCCACAgG 


ATGAG CAC g AATC CTAAAC CT CAAAGAAAAAC CAAAC GT AACAC CAAC CGcCGCCC cATGG 
ATGAGCACGAATCCTAAACCTCAAAGAAAAACCaAAAGAAACACCAACCgCCGCCCACAGG 


AcgTcAAgTTcCCgGGcGGtGGtCAGATCGTtGGtGrGAGTtTActTGtTGCCGCGCAGGGG 

ACGTcAAGTTCCCGGGcGGtGGtCAGATCGTtGGTGGAGTtTAccTGTTGCCGCGCAGGGG 
ACGTtAAGTT cCCGGG cGG cGG cCAGATCGTTGGCGGAGTaTACTTGcTGCCGCGCAGGGG 
ACgTcAAGTTCCCGGGTGGCGGACAGATCGTTGGTGGAGTATACGTGTTGCCGCGCAGGGG 
AcGTaAAgTTCCCgGGtGGtGGcCAGATCGTTGGCGGAGTTTACTTGtTGCCGCGCAGGGG 
ACGTcAAGTTCCCGGGCGGTGGTCAGATCGTTGGTGGAGTtTACTTGTTGCCGCGCAGGGG 
ACGTCAAGTTCCCGGGTGGCGGTCAGATCGTTGGCGGAGTTTACTTGTTGCCGCGCAGGGG 


' t " • 

CCCcaGgtTGGGTGTGCGCgCgaiCtaGgAAgaCTTCcGAgCGgTCgCAaCCtcGtGGaaGg 

CCCcaGgTTGGGTGTGCGCGCgaCtAGGAAGACTTCcGAGCGgTCgCAACCTCGtGGaaGg 
CCCcAGgTTGGGTGTGCGCgCGACaAGGAAgaCTTCcGAgCGaTCCCAGCCgCGTGGgAGg 
CCCACGATTGGGTGTGCGCGCGACGCGTAAAACTTCTGAACGGTCaCAGCCTCGCGGACGa 
CCCcaGgTTGGGTGTGCGCgCgaCTcGgAAGACTTCGGAGCGGTCgCAACCTCGtGGcAGg 
CCC t aGg tTGGGTGTGCGCGCgACTCGGAAGACTTCaGAACGGTCGCAACCC CGTGGgCGG 
CCCCCGGTTGGGTGTGCGCGCGACGAGAAAGACTTCCGAGCGATCCCAGCCCAGAGGCAGG 


CGaCAgCCtATcCCcaAgGctCGcCggcccgagGGcaggtcCTGGGctcagCCcGGgtAcC 

CGaCAaCCTATCCCCAAGGCtCGcCggCCCGAGGGcAGGgCCTGGGCtCAGCCcGGGtAcC 
CGCCAGCCCATCCCgAAAGATCGGCGCt CCACtGGCAAGt CCTGGGGAAaaCCaGGATAtC 
CGACAGCCTATCCCCAAGGCGCGTCGGAGCGAAGGCCGgTCCTGGGCTCAGCCcGGGTACC 
CGtCAgCCTATCCCCaAGGCaCGtCggtccGAGGGcAGgTCCTGGGCtCAgCCcGGGTAcC 
CGcCAGCCTATtCCCAAGGCgCGCCAacCCaCGGGcCGGTCCTGGGGTCAACCCGGGTACC 
CGCCAACCTATACCAAAGGCGCGCCAGCCCCAGGGCAGGCACTGGGCTCAGCCCGGATACC 


CtTGGCCccTcTAtGgcaAtGAgGGcttcGggTGGGCaGGaTGGcTccTgTCcCCcCgcGG 

CtTGGCCCCTCTAtGgCaAtGAGGGCttgGGgTGGGCaGGATGGCTCCTGTCaCCCCgtGG 
C tTGGCCcCTgTAt GGgAAtGAGGGcct CGGcTGGGCaGG tTGGCTCCTGTCCCCCCGcGG 
CTTGGCCCCTCTATGGTAAcGAGGGCTGCGGGTGGGCAGGgTGGCTCCTGTCCCCACGCGG 
CtTGGCCtCTtTAcGGcAAtGAgGGcTGcGGGTGGGCaGGgTGGCTCcTGTCcCCcCGcGG 

CTTGGCCTCTTTATGGAAACGAGGGCTGTGGGTGGGCAGGTTGGCTCCTGTCCCCCCGCGG 


cTCtcggCCtagtTGGGGcCccActGAcCCCCGGCgtaggTCgCGcAAttTGGGtAagGTC 

cTCtCGGCCTAgtTGGGGCCCcAcaGACCCCCGGCGtAGGTCGCGtAATtTGGGtAAgGTC 

tTCtCgtCCttctTGGGGCCCCActGAcCCCCGGCAtAgaTCgCGcAActTGGGtAagGTC 

CTCCCGTCCATCTTGGGGCCCAAAcGACCCCCGGCGgaGGTCCCGCAATTTGGGTAAaGTC 

cTCtcGgCCgTCTTGGGGcCcaAATGATCCCCGGCGgAGgTCcCGcAAttTGGGTAAgGTC 

CTCTCGGCCTAatTGGGGCCCCAAtGACCCCCGGCGaAaaTCGCGCAAtTTGGGtAAGGTC 

CTCCCGGCCACATTGGGGCCCCAATGACCCCCGGCGTCGATCCCGGAATTTGGGTAAGGTC 



FIGURE 6J 


SEP ID NO: Genotype ' 

103-154 cons. 367 ATCGAtACccTcACgTGcgGctTcGCCGAcCTCATGGGgTACaTcCCgcTCGTcGGcGccC 
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367 
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367 
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367 
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SEO ID NO: 

Genotype 


103-154 

cons . 

428 

103-124 

1 

428 

125-134 

2 

428 

135-138 

3 

428 

139-145 

4 

428 

146-153 

5 

428 

154 
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428 

SEO ID NO: 

Genotype 


103-154 

cons . 

489 

103-124 

1 

489 

125-134 

2 

489 

135-138 

3 

489 

139-145 

4 

489 

146-153 

5 

489 

154 

6 

489 

SEO ID NO: 

Genotype 


103-154 

cons . 

550 

103-124 

1 

550 

125-134 

2 

550 

135-138 

3 

550 

139-145 

4 

550 

146-153 

5 

550 

154 

6 

550 


ATCGAtACCCTcACaTGCGGCTTcGCCGACCTCATGGGGTACATtCCGCTCGTCGGcGccC 

ATCGAtACCcTaACgTGcgGttTTGCCGACCTCATGGGgTACaTcCCcGTCGTtGGCGccC 

ATCGATACCCTtACGTGCGGaTTCGCCGACCTCATGGGGTACATCCCGCTCGTCGGCGCTC 

ATCGATACcCTgACgTGcGGcTTcGCCGAcCTCATGGGATACATcCCGcTCGTaGGCGCCC 

ATCGAtACCCTaACgTGCGGATTCGCCGAcCTCATGGGGTACATCCCGCTCGTAGGCGGCC 

ATCGATACCCTAACGTGTGGGTTCGCCGATCTCATGGGGTACATTCCCGTCGTGGGCGCGC 


CcgTaGGgGGcGtcGCcaggGCccTgGCgCAtGGcGTcaGggttcTgGAgGACGGggTgAA 

CccTaGGgGGcGcTGCCAGgGCccTGGCgCAtGGcGTCCGgGTtcTGGAgGACGGCGTGAA 

CggTtGGaGGcGTcGCCAGAGCtCTgGCaCAtGGtGTgAGgGTcCTGGAgGACGGgaTaAA 

CcGTAGGaGGCGTCGCAAGAGCCCTCGCGCATGGCGTGAGGGCCCTTGAAGACGGGATAAA 

CcGTgGGtGGCGTCGCCAGgGCccTgGCgCATGGcGTcAGGgctgTGGAGGACGGgaTcAA 

CCGTTGGGGGCGTCGCAAGGGCtCTcGCACAcGGTGTGAGgGttCTTGAGGACGGGGTAAA 

CTTTGGGCGGCGTCGCGGCTGCGCTCGCACATGGCGTGAGGGCAATCGAGGACGGGATCAA 


cTatGCAACaGGgAAttTgCCcGGTTGCtCtTTcTCtATcTTccTccTgGCtcTgcTgTCc 

cTAtGCAACAGGGAAtcTgCCcGGTTGCtCtTTcTCTATCTTCCTctTgGCttTGcTgTCc 

tTAtGCAACaGGgAAttTgCCtGGTTGCTCtTTtTCTATcTTctTgcTtGCccTtcTGTCc 

TTTcGCAACAGGGAACTTGCCCGGTTGCTCcTTTTCTATCTTCCTTCTTGCtCTGTTCTCt 

cTAtGCAACAGGGAAtCTTCCcGGTTGCTCtTTcTCTATCTTcCTctTgGCACTtcTcTCG 

cTAtGCAACaGGGAATtTgCCCGGTTGCTCTTTCTCtATCTTTaTCCTTGCACTTCTcTCg 

TTATGCAACAGGGAATCTCCCCGGTTGCTCTTTCTCTATCTTCC m ' l GGCACTACTCTCG 


TGcctgaccgtcCCagcttCtgct 

TGt tTgAC ca t cCCaGctTCcGCt 
TGCatCaCagtgCCaGtgTCtGCt 
TGCtTAaTTCATCCaGCAGCTAGT 
TGCcTgACtgttCCagCgTCgGCc 
TGccTgAccgTCCCggCCtCtGCa 
TGCCTCACAACGCCAGCTTCGGCT 



SEQ ID NO: Genotype ATGAGCACgaaTCCtAAACCtCAAAGAaAaACCaaAcGtAAcACcAaCCgcCGCCCacagGAcgTcAAgTTcCCgGGcGGtGGtCAGATCGT tGGtGGAGTtTActTGtTGCCGC 



i 


■GgaGG 



TCgCGeAAttTGGGtAagGTCATLGAtACecTcACgTGegGetTcGCCGAeCTCATGGGgTACaTcCCgcTCGTcGGcGccCCcgTaGGgGGcGtcGCcaggGCccTgGCgCAt 




FIGURE 7A 


SEP ID NO: 

156 

157 

158 

159 

160 
155 


ISOLATE 

US11 

S14 

SW1 

S18 

DR4 

DK7 


155-160 consensus 


1 MSTNPKPQRKTKROT’NRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSOPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRApRKTSERSQPRGR 

MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRAtRKTSERSQPRGR 


SEP ID NO: 

156 

157 

158 

159 

160 
155 


ISOLATE 

US11 

S14 

SW1 

S18 

DR4 

DK7 


62 RQPIPKARRPEGRTWAQPGYPWPLYGNEGCGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
62 RQPI PKARRPEGRTWAQPGYPWPLYGNEGCGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
62 RQPIPKARRPEGRTWAQPGYPWPLYGNEGCGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
62 RQPIPKARRPEGRTWAQPGYPWPLYGNEGCGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
62 RQPIPKARRPEGRTWAQPGYPWPLYGNEGCGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
62 rqpipkarrpegrtwaqpgypwplygnegcgwagwllsprgsrpswgptdprrrsrnlgkv 


155-160 consensus 


rqpipkarrpegrtwaqpgypwplygnegcgwagwllsprgsrpswgptdprrrsrnlgkv 


SEP ID NO: ISOLATE 

156 US11 

157 S14 

158 SW1 

159 S18 

160 DR4 

155 DK7 


123 IDTLTCGFADLMGYIPLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFSIFLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCS FS I FLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCS FSI FLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCS FS I FLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCS FSI FLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCS FS I FLLALLS 


155-160 consensus 


IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFS I FLLALLS 


SEP ID NO: 

156 

157 

158 

159 

160 
155 


ISOLATE 

US11 

S14 

SW1 

S18 

DR4 

DK7 


184 CLTVPASA 
184 CLTVPASA 
184 CLTVPASA 
184 CLTVPASA 
184 CLTVPASA 
184 CLTVPASA 


155-160 


consensus 


CLTVPASA 



FIGURE 7B 



1 MSTtPKFQRKTKRNTsRRPQDVXFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSOPRGR 
1 MSTNPKPQRqTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSOPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYTjLPRRGPRLGVRATRKTSERSOPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSOPRGR 
1 MSTOTKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 


161-176 consensus 


MSTnPKPQRkTKRNTnRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 


175 

170 
162 

171 

163 

165 
169 

164 

166 

167 

168 
161 
174 

172 

176 

173 


S9 
D1 
P10 
IND3 
US 6 
DK1 
T10 
SW2 
SA10 
HK4 
HK3 
T3 
HK5 


62 RQP IPKARRPEGRAWAQPGHPWPLYaNEGLGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
62 RQPIPKARRPEGRAWAQPGHPWPLYGNEGLGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
62 RQPIPKARRPEGRAWAQPGHPWPLYGNEGLGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
62 RQPIPKARhPEGRAWAQPGYPWPLYGNEGLGWAGWLLSPRGSRPSWGPnDPRRRSRNLGKV 
62 RQPIPKARRPEGRAWAQPGYPWPLYGNEGLGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
62 RQPIPKARRPEGRAWAQPGYPWPLYGNEGLGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
62 RQPIPKARRPEGRAWAQPGYPWPLYGNEGLGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
62 RQPIPKARRPEGRAWAQPGYPWPLYGNEGMGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
62 RQPIPKARRPEGRAWAQPGYPWPLYGNEGMGWAGWLLSPRGSRPSWGPnDPRRRSRNLGKV 
62 RQPIPKARQPEGRAWAQPGYPWPLYGNEGMGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
62 RQPIPKARQPEGRAWAQPGYPWPLYGNEGMGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
62 RQPIPKARQPEGRTWAQPGYPWPLYGNEG1GWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
62 RQP I PKARQPEGRTWAQ PGYPWPLYGNEGMGWAGWLLS PRGS RPS WGPTDPRRRSRNLGKV 
62 RQPIPKARQPEGRTWAQPGYPWPLYGNEGMGWAGWLLSPRGSRPNWGPTDPRRRSRNLGKV 
62 RQPIPKARRPEGRaWAQPGYPWPLYGdEGMGWAGWLLSPRGSRPNWGPTDPRRRSRNLGKV 
62 RQPIPKARRPEGRtWAQPGYPWPLYGnEGMGWAGWLLSPhGSRPsWGPTDPRRRSRNLGKV 


consensus 


RQP1 P KARr PEGRaWAQPGyPWPLY gnEG - GWAGWLLSPrGSRPsWGP tDPRRRSRNIjGKV 


175 

170 
162 

171 

163 

165 
169 

164 

166 

167 

168 
161 
174 

172 

176 

173 


ISOLATE 
P8 
IND8 
S45 
S9 
D1 
P10 
IND3 
US 6 
DK1 
T10 
SW2 
SA10 
HK4 
HK3 
T3 
HK5 


123 IDTLTCGFADLMGYI PLVGgPLGGvARALAHGVRWEDGVNYATGNLPGCSFSIFLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFSIFLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFSIFLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFSIFLLALLS 
123 IDTLTCGFADLMGYIPLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFSIFLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFSIFLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFSIFLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFSI FLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFS I FLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFSI FLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFSIFLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCpFSI FLLALLS 
123 I DTLTC G F AD LMG Y I P LVGAP LGG VARALAH GVR V v E DGVNY ATGNL PG C S F S I FLLALL S 
123 I DTLTCG F AD LMG Y I PLVGAPLGGVARALAHGVRVLEDGVNYATGNLPG CS FS I FLLALLS 
123 I DTLTCGFADLMGYIPLVGAPLGGVARALAHGVRVLEDGVNYATGNLPGCSFSI FLLALLS 
123 I DTLTCGFADLMGYI PLVGAPLGGVARALAHGVRVLEDGVNYATGNi PGCSFS I FLLALLS 


consensus 


IDTLTCGFADLMGYI PLVGaPLGGaARALAHGVRVlEDGVNYATGNl PGCs FS I FLLALLS 


SEP ID NO: ISOLATE 
175 P8 
170 IND8 
162 S45 


184 CLTiPASA 
184 CLTvPASA 
184 CLTIPASA 



FIGURE 7B 


171 

163 

165 
169 

164 

166 

167 

168 
161 
174 

172 
176 

173 


S9 184 CLTIPASA 

D1 184 CLTIPASA 

P10 184 CLTIPASA 

IND3 184 CLTIPASA 

US 6 184 CLTIPASA 

DK1 184 CLTIPASA 

T10 184 CLTIPASA 

SW2 184 CLTIPASA 

SA10 184 CLTIPASA 

HK4 184 CLTIPASA 

HK3 184 CLTtPASA 

T3 184 CLTiPASA 

HK5 184 CLTtPvSA 


161-176 consensus 


CLTiPaSA 



FIGURE 7C 


SEP ID NO: 

173 
176 
172 

174 
161 
168 
167 
166 

164 

169 

165 
163 

156 

157 

158 

159 

160 
155 

170 
162 

171 

175 


T3 
HK3 
HK4 
SA10 
SW2 
T10 
DK1 
US 6 
IND3 
P10 
D1 
US11 
S14 
SW1 
S18 
DR4 
DK7 
IND8 
S45 
S9 
P8 


1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRApRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRqTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTtPKPQRKTKRNTsRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 


155-176 consensus 


MSTnPKPQRkTKRNTnRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRAtRKTSERSQPRGR 


SEP ID NO: 

173 
176 
172 

174 
161 
168 
167 
166 

164 

169 

165 
163 

156 

157 

158 

159 

160 
155 

170 
162 

171 

175 


T3 
HK3 
HK4 
SA10 
SW2 
T10 
DK1 
US 6 
IND3 
P10 
D1 
US11 
S14 
SW1 
S18 
DR4 
DK7 
IND8 
S4 5 
S9 
P8 


62 RQPIPKARRPEGRtWAQPGYPWPLYGnEGMGWAGWLLSPhGSRPsWGPTDPRRRSRNLGKV 
62 RQPIPKARRPEGRaWAQPGYPWPLYGdEGMGWAGWLLSPRGSRPNWGPTDPRRRSRNLGKV 
62 RQPIPKARQPEGRTWAQPGYPWPLYGNEGMGWAGWLLSPRGSRPNWGPTDPRRRSRNLGKV 
62 RQPIPKARQPEGRTWAQPGYPWPLYGNEGMGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
62 RQPIPKARQPEGRTWAQPGYPWPLYGNEG1GWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
6 2 RQPI PKARQPEGRAWAQPGYPWPLYGNEGMGWAGWLLS PRGSRPSWGPTDPRRRSRNLGKV 
6 2 RQPI PKARQPEGRAWAQPGYPWPLYGNEGMGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
62 RQPIPKARRPEGRAWAQPGYPWPLYGNEGMGWAGWLLSPRGSRPSWGPnDPRRRSRNLGKV 
62 RQPI PKARRPEGRAWAQPGYPWPLYGNEGMGWAGWLLS PRGSRPSWGPTDPRRRSRNLGKV 
62 RQPIPKARRPEGRAWAQPGYPWPLYGNEGLGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
62 RQPIPKARRPEGRAWAQPGYPWPLYGNEGLGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKV 
6 2 RQPI PKARRPEGRAWAQPGYP WPLYGNEGLGWAGWLLS PRGSRPS WGPTDPRRRS RNLGKV 
62 RQPI PKARRP EGRTWAQPGYP WPL YGNEGCGWAGWLLS PRG S RP SWG PTDPRRRS RNLGKV 
62 RQPI PKARRPEGRTWAQPGYPWPLYGNEGCGWAGWLLS PRGSRPSWGPTDPRRRSRNLGKV 
62 RQPI PKARRPEGRTWAQPGYPWPLYGNEGCGWAGWLLS PRGSRPSWGPTDPRRRSRNLGKV 
62 RQPI PKARRPEGRTWAQPGYP WPL YGNEGCGWAGWLLS PRG S RP S WGPTDPRRRS RNLGKV 
62 RQPI PKARRPEGRTWAQPGYP WPLYGNEGCGWAG WLLS PRG S RPS WG PTDPRRRS RNLGKV 
62 RQPI PKARRPEGRTWAQPGYPWPLYGNEGCGWAG WLLS PRGSRPSWGPTDPRRRSRNLGKV 
6 2 RQPI PKARRPEGRAWAQPGHPWPLYGNEGLGWAG WLLS PRGSRPSWGPTDPRRRSRNLGKV 
62 RQPI PKARRPEGRAWAQPGHPWPLYGNEGLGWAG WLLS PRGSRPSWGPTDPRRRSRNLGKV 
62 RQPI PKARhPEGRAWAQPGyPWPLYGNEGLGWAGWLLSPRGSRPSWGPnDPRRRS RNLGKV 
62 RQPI PKARrPEGRAWAQPGhPWPLYaNEGLGWAGWLLSPRGSRPSWGPtDPRRRS RNLGKV 


155-176 consensus 


RQPI PKARr PEGRaWAQPGyPWPLY gnEG - GWAGWLLS PrGSRP sWGP tDPRRRS RNLGKV 


SEP ID NO: 

173 
176 
172 

174 
161 
168 
167 
166 

164 
169 

165 
163 

156 

157 

158 


ISOLATE 
HK5 
T3 
HK3 
HK4 
SA10 
SW2 
T10 
DK1 
US 6 
IND3 
P10 
D1 
US11 
S14 
SW1 


123 IDTLTCGFADLMGYIPLVGAPLGGVARALAHGVRVLEDGVNYATGNiPGCSFSIFLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGVARALAHGVRVLEDGVNYATGNLPGCSFS I FLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGVARALAHGVRVLEDGVNYATGNLPGCS FS I FLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGVARALAHGVRVvEDGVNYATGNLPGCSFS I FLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCpFS I FLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCS FS I FLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCS FS I FLLALLS 
123 IDTLTCGFADLMGY I PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCS FS I FLLALLS 
123 IDTLTCGFADLMGY I PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCS FS I FLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCS FS I FLLALLS 
123 IDTLTCGFADLMGY I PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCS FSI FLLALLS 
123 IDTLTCGFADLMGYI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCS FSI FLLALLS 
123 IDTLTCGFADLMGY I PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCS FS I FLLALLS 
123 IDTLTCGFADLMGY I PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCS FS I FLLALLS 
123 IDTLTCGFADLMGY I PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCS FSI FLLALLS 



FIGURE 7C 


159 

160 
155 

170 
162 

171 
175 


S18 123 IDTLTCGFADLMGYIPLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFSIFLLALLS 

DR4 123 IDTLTCGFADLMGYIPLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFSIFLLALLS 

DK7 123 IDTLTCGFADLMGYIPLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFSIFLLALLS 

IND8 123 IDTLTCGFADLMG YI PLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCS FS I FLLALLS 

S45 123 IDTLTCGFADLMGYIPLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFSIFLLALLS 

S9 123 IDTLTCGFADLMGYIPLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFSIFLLALLS 

P8 123 IDTLTCGFADLMGYIPLVGgPLGGvARALAHGVRVvEDGVNYATGNLPGCSFSIFLLALLS 


155-176 consensus 


IDTLTCGFADLMG Y I PLVGaPLGGaARALAHGVRVl EDGVNYATGN1 PGCs FS I FLLALLS 


SEP ID NO: 

173 
176 
172 

174 
161 
168 
167 
166 

164 

169 

165 
163 

156 

157 

158 

159 

160 
155 

170 
162 

171 

175 


T3 
HK3 
HK4 
SA10 
SW2 
T10 
DK1 
US 6 
IND3 
P10 
D1 
US11 
S14 
SW1 
S18 
DR4 
DK7 
IND8 
S4 5 
S9 
P8 


184 CLTtPvSA 
184 CLTiPASA 
184 CLTtPASA 
184 CLTIPASA 
184 CLTIPASA 
184 CLTIPASA 
184 CLTIPASA 
184 CLTIPASA 
184 CLTIPASA 
184 CLTIPASA 
184 CLTIPASA 
184 CLTIPASA 
184 CLTVPASA 
184 CLTVPASA 
184 CLTVPASA 
184 CLTVPASA 
184 CLTVPASA 
184 CLTVPASA 
184 CLTVPASA 
184 CLTIPASA 
184 CLTIPASA 
184 CLTIPASA 


consensus 


CLTiPaSA 


155-176 



FIGURE 7D 


SEP ID NO: ISOLATE 


179 

T9 

178 

US10 

180 

T2 

177 

T4 

177-180 

consensus 

SEO ID NO: 

ISOLATE 

179 

T9 

178 

US10 

180 

T2 

177 

T4 

177-180 

consensus 


179 


T9 

178 


US10 

180 


T2 

177 


T4 

177- 

■180 

consensus 

SEO 

ID NO: 

ISOLATE 

179 


T9 

178 


US10 

180 


T2 

177 


T4 


1 MSTNPKPQRKTiRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRtTRKTSERSOPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTiPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTnPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 

MSTnPKPQRKTkRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRaTRKTSERSQPRGR 


62 RQPIPKDRRsTGKSWGKPGYPWPLYGNEGLGWAGWLLSPRGSRPSWGPsDPRHRSRNVGKV 
62 RQPIPKDRRpTGKSWGKPGYPWPLYGNEGLGWAGWLLSPRGSRPSWGPtDPRHRSRNVGKV 
62 RQPIPKDRRSTGJCSWGKPGYPWPLYGNEGLGWAGWLLSPRGSRPSWGPNDPRHRSRNVGKV 
62 RQPIPKDRRSTGKSWGKPGYPWPLYGNEGLGWAGWLLSPRGSRPSWGPNDPRHRSRNVGKV 

RQPIPKDRRsTGKSWGKPGYPWPLYGNEGLGWAGWLLSPRGSRPSWGPnDPRHRSRNVGKV 


123 I DTLTCG FADLMGY I P WGAPLGGVARALAHGVRVLEDGVNYATGNLPGCSFS I FLLALLS 
123 IDTLTCGFADLMGYI PWGAPLGGVARALAHGVRVLEDGVNYATGNLPGCSFSIFLLALLS 
12 3 IDTLTCGFADLMGYI PVVGAPLGGVARALAHGVRVLEDGVNYATGNLPGCS FS IFLLALLS 
123 IDTLTCslADLMGYvPWGgPLGGVARALAHGVRVLEDGVNYATGNLPGCSFS I FLLALLS 

IDTLTCgf ADLMGYi P WGaPLGGVARALAHGVRVLEDGVNYATGNLPGCSFS I FLLALLS 


184 CITtPaSA 
184 CITIPVSA 
184 CITIPVSA 
184 CITIPVSA 


177-180 consensus 


CITiPvSA 



FIGURE 7E 


SEP ID NO: ISOLATE 


183 

184 
181 
182 

185 


DK11 

SW3 

T8 

US1 

DK8 

181 

185 

consensus 

SEO 

ID NO 

ISOLATE 

183 

184 
181 
182 

185 


DK11 

SW3 

T8 

DS1 

DK8 

181- 

185 

consensus 

SEO 

ID NO: 

ISOLATE 

183 

184 
181 
182 

185 


DK11 

SW3 

T8 

US1 

DK8 

181- 

185 

consensus 

SEO 

ID NO: 

ISOLATE 

183 

184 
181 
182 

185 


DK11 

SW3 

T8 

US1 

DK8 


1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRtTRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPELGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKsSERSQPRGR 

MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRaTRKtSERSQPRGR 


62 RQPIPKDRRSTGKpWGKPGYPWPLYGNEGCGWAGWLLSPRGSHPNWGPTDPRHkSRNLGKV 
62 RQPIPKDRRSTGKSWGKPGYPWPLYGNEGCGWAGWLLSPRGSHPNWGPTDPRHRSRNLGKV 
62 RQPIPKDRRSTGKSWGKPGYPWPLYGNEGCGWAGWLLSPRGSRPTWGPTDPRHRSRNLGrV 
62 RQPIPKDRRSTGKSWGKPGYPWPLYGNEGCGWAGWLLSPRGSRPTWGPTDPRHRSRNLGKV 
6 2 RQPI PKDRRSTGKSWGKPGYPWPLYGNEGCGWAGWLLS PRGSRPTWGPTDPRHRSRNLGKV 

RQPIPKDRRSTGKsWGKPGYPWPLYGNEGCGWAGWLLSPRGSrPtWGPTDPRHrSRNLGkV 


123 IDTITCGFADLMG YI P WGAP VGGVARALAHGVRVLEDG INYATGNLPG CS F S I FLLALLS 
123 IDTITCGFADLMGYI PWGAPVGGVARALAHGVRVLEDGINYATGNLPGCSFS I FLLALLS 
123 IDTITCGFADLMGYI PWGAPVGGVARALAHGVRVLEDGINYATGNLPGCSFS I FLLALLS 
123 IDTITCGFADLMGYI PWGAPVGGVARALAHGVRVLEDGINYATGNLPGCSFSI FLLALLS 
123 IDTITCGFADLMGYI PWGAPVGGVARALAHGVRVLEDGINYATGNLPGCSFSIFLLALLS 

IDTITCGFADLMGYI PWGAPVGGVARALAHGVRVLEDGINYATGNLPGCSFSIFLLALLS 


184 CcTVPVSA 
184 CFTVPVSA 
184 CFTVPVSA 
184 CaTVPVSA 
184 CcTVPVSA 


181-185 


consensus 


C-TVPVSA 



FIGURE 7F 


SEQ 

ID NO 

ISOLATE 

183 


DK11 

184 


SW3 

181 


T8 

182 


US1 

IRS 


DK8 

186 


S83 

178 


US10 

180 


T2 

179 


T9 

177 


T4 

177 

186 

consensus 

SEQ 

ID NO 

ISOLATE 

183 


DK11 

184 


SW3 

181 


T8 

182 


US1 

185 


DK8 

186 


S83 

178 


US10 

180 


T2 

179 


T9 

177 


T4 

177- 

186 

consensus 

SEO 

ID NO: 

ISOLATE 

183 


DKll 

184 


SW3 

181 


T8 

182 


US1 

185 


DK8 

186 


S83 

178 


US10 

180 


T2 

179 


T9 

177 


T4 

177- 

186 

consensus 

SEO 

ID NO: 

ISOLATE 

183 


DKll 

184 


SW3 

181 


T8 

182 


US1 

185 


DK8 

186 


S83 

178 


US10 

180 


T2 

179 


T9 

177 


T4 


1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRtTRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKsSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTiPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTiRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRtTRKTSERSQPRGR 
1 MSTNPKPQRKTkRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRaTRKTSERSQPRGR 

MSTnPKPQRKTkRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRaTRKtSERSQPRGR 


62 RQPIPKDRRSTGKpWGKPGYPWPLYGNEGCGWAGWLLSPRGSHPNWGPTDPRHkSRNLGKV 
62 RQPIPKDRRSTGKSWGKPGYPWPLYGNEGCGWAGWLLSPRGSHPNWGPTDPRHRSRNLGKV 
62 RQPIPKDRRSTGKSWGKPGYPWPLYGNEGCGWAGWLLSPRGSRPTWGPTDPRHRSRNLGrV 
62 RQPIPKDRRSTGKSWGKPGYPWPLYGNEGCGWAGWLLSPRGSRPTWGPTDPRHRSRNLGKV 
62 RQPIPKDRRSTGKSWGKPGYPWPLYGNEGCGWAGWLLSPRGSRPTWGPTDPRHRSRNLGKV 
62 RQPIPKDRRtTGKSWGrPGYPWPLYGNEGLGWAGWLLSPRGSRPSWGPTDPRHkSRNLGKV 
62 RQPIPKDRRpTGKSWGKPGYPWPLYGNEGLGWAGWLLSPRGSRPSWGPTDPRHRSRNVGKV 
62 RQPIPKDRRSTGKSWGKPGYPWPLYGNEGLGWAGWLLSPRGSRPSWGPnDPRHRSRNVGKV 
62 RQPIPKDRRSTGKSWGKPGYPWPLYGNEGLGWAGWLLSPRGSRPSWGPsDPRHRSRNVGKV 
62 RQPIPKDRRSTGKSWGKPGYPWPLYGNEGLGWAGWLLSPRGSRPSWGPnDPRHRSRNVGKV 

RQP I PKDRRsTGKs WGkPGYPWPLYGNEG - GWAGWLLSPRGSrPsWGPtDPRHrSRNIGkV 


123 I DTI TCG FAD LMG Y I P VVGAP VGGVARALAHGVRVLEDG I NYATGNLPGCS FS I FLLALLS 
123 I DTI TCG FAD LMG Y I P WGAP VGGVARALAHGVRVLEDG I NYATGNLPGC S FS I FLLALLS 
123 I DTITCGFADLMGYIPWGAPVGGVARALAHGVRVLEDGINYATGNLPGCSFS IFLLALLS 
123 IDTITCGFADLMGYI PWGAPVGGVARALAHGVRVLEDGINYATGNLPGCSFS IFLLALLS 
123 IDTITCGFADLMGYI P WGAP VGGVARALAHGVRVLKDG I NYATGNLPGCS FS I FLLALLS 
123 I DTLTCG FAD LMG Y I P WGAP VGGVARALAHGVRVLKDG I NYATGNLPGCS FS I FLLALLS 
123 IDTLTCGFADLMGYI PWGAPLGGVARALAHGVRVLEDGVNYATGNLPGCS FS IFLLALLS 
123 IDTLTCGFADLMGY I PWGAPLGGVARALAHGVRVLEDGVNYATGNLPGCSFS IFLLALLS 
123 I DTLTCG FAD LMG Y I P WGAP LGGVARALAHGVRVLEDGVNYATGNLPGC S FS IFLLALLS 
123 IDTLTCs lADLMGYvPWGgPLGGVARALAHGVRVLEDGVNYATGNLPGCSFSIFLLALLS 

IDT - TCgf ADLMGYi PWGaPvGGVARALAHGVRVLEDG i NYATGNLPGCS FS I FLLALLS 


184 CcTVPVSA 
184 CFTVPVSA 
184 CFTVPVSA 
184 CaTVPVSA 
184 CcTVPVSA 
184 CIsVPVSA 
184 CITIPVSA 
184 CITIPVSA 
184 CITCPaSA 
184 CITiPvSA 


177-186 


consensus 


CitvPvSA 



FIGURE 7G 


SEP ID NO: 

189 

187 

190 

188 


ISOLATE 

S2 

HK10 

DK12 

S52 


1 MSTLPKPQRKTKRNTIRRPQDiKFPGGGQIVGGVYVLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTLPKPQRKTKRNTIRRPQDVKFPGGGQIVGGVYVLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTLPKPQRKTJCRNTIRRPQDVKFPGGGQIVGGVYVLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTLPKPQRKTKRNTIERPQDVKFPGGGQIVGGVYVLPRRGPRLGVRATRKTSERSQPRGR 


187-190 consensus 


MSTLPKPQRKTKRNTIRRPQDvKFPGGGQIVGGVYVLPRRGPRLGVRATRKTSERSQPRGR 


SEP ID NO: 

189 

187 

190 

188 


IS O LATE 

S2 


HK10 

DK12 

S52 


62 RQPIPKARRSEGRSWAQPGYPWPLYGNEGCGWAGWLLSPRGSRPSWGPNDPRRRSRNLGKV 
62 RQPIPKARRSEGRSWAQPGYPWPLYGNEGCGWAGWLLSPRGSRPSWGPNDPRRRSRNLGKV 
62 RQPIPKARRSEGRSWAQPGYPWPLYGNEGCGWAGWLLSPRGSRPSWGPNDPRRRSRNLGKV 
62 RQPIPKARRSEGRSWAQPGYPWPLYGNEGCGWAGWLLSPRGSRPSWGPNDPRRRSRNLGKV 


187-190 consensus 


RQP I PKARRSEGRS WAQPGYPWPLYGNEGCGWAGWLLS PRGSRPS WGPNDPRRRSRNLGKV 


SEP ID NO: 

189 

187 

190 

188 


HK10 

DK12 

S52 


123 IDTLTCGFADLMGYIPLVGAPVGGVARALAHGVRALEDGINFATGNLPGCSFSIFLLALFS 
123 IDTLTCGFADLMGYI PLVGAPVGGVARALAHGVRALEDGINFATGNLPGCSFS IFLLALFS 
123 IDTLTCGFADLMGYI PLVGAPVGGVARALAHGVRALEDGINFATGNLPGCSFS IFLLALFS 
123 IDTLTCGFADLMGYI PLVGAPVGGVARALAHGVRALEDGINFATGNLPGCSFS IFLLALFS 


187-190 consensus 


IDTLTCGFADLMGYI PLVGAPVGGVARALAHGVRALEDGINFATGNLPGCSFS IFLLALFS 


SEQ ID NO: ISOLATE 

189 S2 

187 HK10 

190 DK12 

188 S52 


184 CLIHPAAS 
184 CLIHPAAS 
184 CLIHPAAS 
184 CLvHPAAS 


187-190 consensus 


CLiHPAAS 



FIGURE 7H 


SEP ID NO: 

194 
193 
192 

195 

196 
191 

197 


1 MSTNPKPQRKTKRNTNRRPMDVKFPGGGQIVGGVYLLPRRGPRLGVRAtRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPMDVKFPGGGQIVGGVYLLPRRGPRLGVRAaRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPMDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPMDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPMDVKFPGGGQIVGGVYLLPRRGPRLGVRtTRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPMDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPMDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 


191-197 consensus 


MSTNPKPQRKTKRNTNRRPMDVKFPGGGQIVGGVYLLPRRGPRLGVRatRKTSERSQPRGR 


194 
193 
192 

195 

196 
191 

197 


ISOLATE 

Z5 

Z1 

Z8 

Z6 

Z7 

Z4 

DK13 


62 RQ PIP qARRS EGRS WAQPG YPWPLYGNEGCGWAGWLLS PRGS RP S WGqND PRRRSRNLGKV 
62 RQPI P KARRS EGRS WAQPGYPWPLYGNEGCGWAGWLLS PRGSRPSWGPNDPRRRSRNLGKV 
62 RQPIPKARRS EGRS WAQPGYPWPLYGNEGCGWAGWLLS PRGSRPSWGPNDPRRRSRNLGKV 
62 RQPIPKARRS EGRSWAQPGYPWPLYGNEGCGWAGWLLS PRGSRPSWGPNDPRRRSRNLGKV 
62 RQPI PKARRSEGRSWAQPGYPWPLYGNEGCGWAGWLLS PRGSRPSWGPNDPRRRSRNLGKV 
62 RQPIPKARQpEGRSWAQPGYPWPLYGNEGCGWAGWLLSPRGSRPSWGPNDPRRRSRNLGKV 
62 RQPIPKARQ1EGRSWAQPGYPWPLYGNEGCGWAGWLLSPRGSRPSWGPNDPRRRSRNLGKV 


191-197 consensus 


RQPIPkARrsEGRSWAQPGYPWPLYGNEGCGWAGWLLSPRGSRPSWGpNDPRRRSRNLGKV 


194 
193 
192 

195 

196 
191 

197 


ISOLATE 

Z5 

Z1 

Z8 

Z6 

Z7 

Z4 

DK13 


123 

123 

123 

123 

123 

123 

123 


IDTLTCGFADLMGYIPLVGAPVGGVARALAHGVRAlEDGINYATGNLPGCSFSIFLLALf S 
IDTLTCGFADLMGYIPLVGAPVGGVARALAHGVRAVEDGINYATGNLPGCSFSIFLLALLS 
IDTLTCGFADLMGYIPLVGAPVGGVARALAHGVRAVEDGINYATGNLPGCSFSIFLLALLS 
IDTLTCGFADLMGYIPLVGAPVGGVARALAHGVRAVEDGINYATGNLPGCSFSIFLLALLS 
IDTLTCGFADLMGYIPLVGAPVGGVARALAHGVRA1EDGINYATGNLPGCSFSIFLLALLS 
IDTLTCGFADLMGYIPiVGAPVGGVARALAHGVRAvEDGINYATGNLPGCSFSIFLLALLS 
IDTLTCGFADLMGYIPvVGAPVGGVARALAHGVRllEDGvNYATGNLPGCSFSIFLLALLS 


191-197 consensus 


IDTLTCGFADLMGYIPIVGAPVGGVARALAHGVRavEDGiNYATGNLPGCSFSIFLLALlS 


SEP ID NO: 

194 
193 
192 

195 

196 
191 

197 


184 CLTTPASA 
184 CLTTPASA 
184 CLTVPASA 
184 CLTVPtSA 
184 CLTVPASA 
184 CLTVPASA 
184 CLTVPASA 


191-197 


consensus 


CLTvPaSA 



FIGURE 71 


SEP ID NO: ISOLATE 


205 

SA11 

202 

SA3 

198 

SA4 

199 

SA5 

200 

SA7 

203 
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201 

SA1 

204 

SA6 

198-205 

consensus 

SEO ID NO 

ISOLATE 

205 

SA11 

202 

SA3 

198 

SA4 

199 

SA5 

200 

SA7 
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SA13 
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SA1 

204 

SA6 

198-205 

consensus 

SEO ID NO: 

ISOLATE 

205 

SA11 

202 

SA3 

198 

SA4 

199 

SA5 

200 

SA7 

203 

SA13 

201 

SA1 

204 

SA6 

198-205 

consensus 

SEO ID NO: 

ISOLATE 

205 

SA11 

202 

SA3 

198 

SA4 

199 

SA5 

200 
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1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTKRNTN1RPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGR 
1 MSTNPKPQRKTqRNTNrRPQDVKFPGGGQIVGGVYLLPRRGPRmGVRATRKTSERSQPRGR 

MSTNPKPQRKTkRNTNrRPQDVKFPGGGQIVGGVYLLPRRGPRlGVRATRKTSERSQPRGR 
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6 2 RQPI PKARQPTGRSWGQPGYPWPLYANEGLGWAGWLLS PRGSRPNWGPNDPRRKS RNLGKV 
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RQPIPKARQptGRSWGQPGYPWPlYANEGLgWAGWLLSPRGSRPnWGPNDPRRkSRNLGKV 
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IDTLTCGFADLMGYIPLVGGPVGGVARALAHGVRVLEDGVNYATGNLPGCSFS IFILALLS 
IDTLTCGFADLMGYIPLVGGPVGGVARALAHGVRVLEDGVNYATGNLPGCSFSIFvLALLS 


IDTLTCGFADLMGY I PLVGG PVGGVARALAHGVRvLEDGVNYATGNLPGCS FS IFiLALLS 
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PATENT 


Docket No. 2026-4116 

IN THE UNITED STATES PATENT AND TRADEMARK OFFTCF 
Applicant(s) : Jens Bukh, et al 

Serial No. : 08/290,665 Group Art Unit: To be assigned 

Filed : August 15, 1994 Examiner: To be assigned 

For : NUCLEOTIDE AND DEDUCED AMINO ACID SEQUENCES OF THE E 

ENVELOPE 1 AND CORE GENES OF ISOLATES OF HEPATITIS C VIRUS 
AND THE USE OF REAGENTS DERIVED FROM THESE SEQUENCES IN 
DIAGNOSTIC METHODS AND VACCINES 

Hon. Commissioner of Patents and Trademarks 
Washington, D.C. 20231 

ASSOCIATE POWER OF ATTORNEY 


Dear Sir: 

Pursuant to the provisions of 37 CFR 1.33 and 1.34 and MPEP 402.02, the 

undersigned attorney of record hereby appoints the following as associate attorneys to 

prosecute this application, to receive the patent, and to transact all business in the Patent and 

Trademark Office in connection with the above-identified application: 

Kurt E. Richter (Reg. No. 24,052); Eugene Moroz (Reg. No. 25,237); William S. Feiler (Reg. No. 
26,728); Israel Blum (Reg. No. 26,710); Bartholomew Verdirame (Reg. No. 28,483); Maria C. H. Lin (Reg. No. 
29,323); Christopher E. Chalsen (Reg. No. 30,936); Eugene C. Rzucidlo (Reg. No. 31,900); Mary J. Morry (Reg. 
No. 34,398); Michael M. Murray (Reg. No. 32, 537); Jean E. Shimotake (Reg. No. 36,273); Kathryn M. Brown 
(Reg. No. 34,556); Leslie A. Serunian (Reg. No. 35,353); Dorothy R. Auth (Reg. No. 36,434); Richard W. Bork 
(Reg. No. 36,459); M. Caragh Noone (Reg. No. 37,197); David V. Rossi (Reg. No. 36,659) and Carol M. Gruppi 
(Reg. No. 37,341) of Morgan & Finnegan whose address is: 345 Park Avenue, New York, New York 10154. 

Respectfully submitted. 


Date / l- //£"/?> __ 

Ann S. Hobbs 
Registration No. 36,830 

Patent Branch 

Office of Technology Transfer 
National Institutes of Health 
Box 13 

601 1 Executive Boulevard, Suite 325 
Rockville, MD 20852 
Tel. No. (301) 496-7056 
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COMBINED DECLARATION AND POWER OF ATTORNEY 

As a below named inventor, I hereby declare that my residence, post office address and citizenship are as stated below «wrt to 
my name, the information given herein is true, that I believe I am the original, first and sole (if only one nam* i s listed below) or 
an or an original, first and joint inventor ( if plural names are listed below) of the subject matter which is claimed and for which 

a patent is sought on the invention entitled: NUCLEOTIDE AND DEDUCED AMINO ACID SEQUENCES OF THE 

ENVELOPE 1 AND CORE GENES OF ISOLATES OF HEPATITIS C VIRUS AND THE USE OF REAGENTS DERIVED 
FROM THESE SEQUENCES IN DIAGNOSTIC METHODS AND VACCINES 

which is described in: [ ] PCT International Application No. filed 

I ] the attached application or [X] the specification in application Serial No. 08/290.665 filed August 15. 1994 

(if applicable) and amended on 

I hereby state that I have reviewed and understand the contents of the above-identified specification, inr.lnHing the claims, as 
amended by any amendment referred to above. 

I acknowledge the duty to disclose all information known to me which is material to the examination of this appli ca t io n in 
accordance with Title 37, Code of Federal Regulations, §1.56 (a). 

I hereby claim foreign priority benefits under Title 35 United States Code, § 119 of any foreign application(s) for patent or 
inventor’s certificate or of any PCT international application(s) designating at least one country other than the United States of 
America listed below and have also identified below any foreign applications^) for patent or inventor’s certificate or any PCT 
international applications^) designating at least one country other than the United States of America filed by me on the gam* 
subject matter having a filing date before that of the application^) of which priority is claimed. 


| COUNTRY 

APPLICATION 

DATE OF FILING 
(day, month, year) 

PRIORITY CLAIMED I 
UNDER 35 USC § 119 




[] Yes 

11 No 




[] Yes 





[] Yes 

11 No j 


C? hereby claim the benefit under Title 35, United States Code §120 of any United States application(s) or PCT International 
L application(s) designating the United States of America that is/are listed below and, insofar as the subject mattw of each of the 
n .claims of this application is not disclosed in that/those prior application(s) in the manner provided by the first paragraph of Title 
~ 35, United States Code, §112, I acknowledge the duty to disclose material information as defined in Title 37, Code of Federal 
^Regulations, § 1.56(a) which occurred between the filing date of the prior application^) and the national or PCT internation al 
filing date of this application. 


Application Serial No. 

Filing Date 

Status: patented, pending, 
abandoned 

08/086,428 

29 June 1993 

pending 








I hereby appoint the following attomey(s) and/or agent(s) to prosecute this application and to transact all business in the Patent 
and Trademark Office connected therewith: 


James C. Haight, Reg. No. 25,588; Gloria Richmond, Reg. No. 30,416; Robert Benson, Reg. No. 33,612; Jack Spiegel, Reg. 
No. 34,477; Laurence J. Hyman, Reg. No. 35,551; Denise C. Bernstein, Reg. No. 35,787; Susan S. Rucker, Reg. No. 35,762; 
David R. Sadowski, Reg. No. 32,808 and Ann S. Hobbs, Reg. No. 36,830 and Arthur J. Cohn, Reg. No. 37,800 all of the 
Office of Technology Transfer, National Institutes of Health, 6011 Executive Boulevard, Suite 325, Rockville, MD 20852 

I fiirther direct that all correspondence concerning this application be directed to: 

Patent Branch 

Office of Technology Transfer 
National Institutes of Health 
Box 13 

6011 Executive Boulevard, Suite 325 
Rockville, MD 20852 
Telephone: (301) 496-7056 
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Ihereby dec are that all statements made herein of my own knowledge are true and that all statements made on information and 

to 1X5 'Tiri ^ rthef th “ e statements were ^ ^ knowledge that willful false statements and 
f? P«mshable by fine or imprisonment, or both, under Section 1001 of Title 18 of the United States Code and 
that such willful false statements may jeopardize the validity of the application or any patent issued thereon. 


Full Name of first joint i 
Inventor’s signature: 

Country of Citizenship: /Denmark 

/ 

Residence: .5805 Sonoma Road, Bethesda. Maryland 20817. U S A 



Date: /? H 


Post Office Address: 5805 Sonoma Road. Bethesda. Maryland 20817. U.S.A. 


; Full Name of second joint inventor Roger H. Miller 

_ Inventor’s signature: _ 

Country of Citizenship: United States of America 

Residence: J5504 White Willow Lane. Rockville. Maryland 20853. U.S.A 


Date 


- Post Office Address: .15504 White Willow Lane. Rockville. Maryland 20853. U.S.A. 


Full Name of third joint inventor: Robert H Purcell „ 

Inventor’s signature: (JA 

Date: V 

Country of Citizenship: United States of America 

Residence: J7517 White Grounds R oad. Bovds. Maryland 70841. IJ.S.A. 

Post Office Address: J7517 White Grou nds Road. Bovds. Maryland 20841. U.S.A, 
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